www.infradead.org Git - users/jedix/linux-maple.git/log

SPEC: v2.6.39-100.0.12
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Merge branch '4guru_paravirt_pull' of git://ca-git.us.oracle.com/linux-muvarov-public into uek2-stable

Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux into uek2-stable

Merge branch 'uek2-stable' of git://ca-git.us.oracle.com/linux-uek-2.6.39 into 2.6.39-100.0.5_pulltest

[PATCH] audit: dynamically allocate audit_names when not enough spaceis in the names array

Some system calls, such as loading a module, can cause filesystem activity
such as creating lots of new files in /sys or debugfs. Audit can only capture
20 inodes in a single syscall since it stores them in a fixed length array.
This patch shrinks the fixed length array to 5 names (the number used by a
rename operation) and then allocates names dynamically as they are needed
after 5. It should shut up the complaints we sometimes see about
'name_count maxed, losing inode data" such as that seen in Red Hat bugzilla
445757.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

SPEC: v2.6.39-100.0.11

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Add include bootmem.h in xen-seflballoon.c

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

SPEC: v2.6.39-100.0.10
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

ocfs2: Add datavolume mount option

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Merge branch 'in-3.1/bug.fixes' of git://oss.oracle.com/git/kwilk/xen into uek2-stable

xen: Fix selfballooning and ensure it doesn't go too far

The balloon driver's "current_pages" is very different from
totalram_pages.  Self-ballooning needs to be driven by
the latter.  Also, Committed_AS doesn't account for pages
used by the kernel so enforce a floor for when there are little
or no user-space threads using memory.  The floor function
includes a "safety margin" tuneable in case we discover later
that the floor function is still too aggressive in some workloads,
though likely it will not be needed.

Changes since version 1:
- tuneable safety margin added

[v2: konrad.wilk@oracle.com: make safety margin tuneable]

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>

tg3: Dont dump DMA error when interface not ready.

This fixes orabug 12981473.

When do testing with BCM5704 Gigabit Ethernet, bootup get below warning:

tg3 0000:03:01.0: eth0: DMA Status error. Resetting chip.
<Registers state of device>
tg3 0000:03:01.0: eth0: 0: Host status block [00000007:00000002:(0000:0000:0000):(0000:0000)]
tg3 0000:03:01.0: eth0: 0: NAPI info [00000001:00000002:(0000:0000:01ff):0000:(00c8:0000:0000:0000)]
tg3 0000:03:01.0: eth0: Link is up at 1000 Mbps, full duplex
tg3 0000:03:01.0: eth0: Flow control is on for TX and on for RX
tg3 0000:03:01.0: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3 0000:03:01.0: eth0: Link is down
tg3 0000:03:01.0: eth0: Link is up at 1000 Mbps, full duplex
tg3 0000:03:01.0: eth0: Flow control is on for TX and on for RX

We need not to check the status and dump registers status if the device
not ready,

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>

bnx2x: prevent flooded warnning kernel info

This fixes orabug 12687487 and backported fixes for RHBZ 712000.
When a VN is configured to invalid Max BW, we only need to be told once.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Reported-by: Sriharsha Devdas <sriharsha.devdas@oracle.com>

x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode.

This patch fixes an outstanding issue that has been reported since 2.6.37.
Under a heavy loaded machine processing "fork()" calls could keepover with:

BUG: unable to handle kernel paging request at f573fc8c
IP: [<c01abc54>] swap_count_continued+0x104/0x180
*pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in:
Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3
EIP is at swap_count_continued+0x104/0x180
.. snip..
Call Trace:
[<c01ac222>] ? __swap_duplicate+0xc2/0x160
[<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0
[<c01ac2e4>] ? swap_duplicate+0x14/0x40
[<c01a0a6b>] ? copy_pte_range+0x45b/0x500
[<c01a0ca5>] ? copy_page_range+0x195/0x200
[<c01328c6>] ? dup_mmap+0x1c6/0x2c0
[<c0132cf8>] ? dup_mm+0xa8/0x130
[<c013376a>] ? copy_process+0x98a/0xb30
[<c013395f>] ? do_fork+0x4f/0x280
[<c01573b3>] ? getnstimeofday+0x43/0x100
[<c010f770>] ? sys_clone+0x30/0x40
[<c06c048d>] ? ptregs_clone+0x15/0x48
[<c06bfb71>] ? syscall_call+0x7/0xb

The problem is that in copy_page_range we turn lazy mode on, and then
in swap_entry_free we call swap_count_continued which ends up in:

map = kmap_atomic(page, KM_USER0) + offset;

and then later we touch *map.

Since we are running in batched mode (lazy) we don't actually set up the
PTE mappings and the kmap_atomic is not done synchronously and ends up
trying to dereference a page that has not been set.

Looking at kmap_atomic_prot_pfn, it uses 'arch_flush_lazy_mmu_mode' and
doing the same in kmap_atomic_prot and __kunmap_atomic makes the problem
go away.

Interestingly, git commit b8bcfe997e46150fedcc3f5b26b846400122fdd9
removed part of this to fix an interrupt issue - but it went to far
and did not consider this scenario.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: stable@kernel.org
[v1: Redid the commit description per Jeremy's apt suggestion]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ocfs2: update ocfs2 version

Update the printed version to 1.8.0.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

Merge branch 'uek2-stable-mpt2sas' of git://ca-git.us.oracle.com/linux-mkp-public into uek2-stable

config: disable panic on hardlockup [orabug 13007648]

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

block: Rate-limit failed I/O error message

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Revert "[AUDIT/workaround] Increase AUDIT_NAMES array len"

This reverts commit 629b8994afa2674fac82ab01472e404e48935c9c.

mpt2sas: Do not check DIF for unwritten blocks

Blocks that have never been written contain all ones in the protection
information. When reading, a target device will disable checking for any
block whose application tag contains 0xFFFF.

Some MPT adapters, however, enable checking for all blocks by
default. This causes reads to previously unwritten sectors to fail.

Tweak the relevant bit in the MPT configuration page that causes the HBA
to disable PI checking when a received app tag contains 0xFFFF.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

SPEC: v2.6.39-100.0.9

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

xen/i386: follow-up to "replace order-based range checking of M2P table by linear one"

The numbers obtained from the hypervisor really can't ever lead to an
overflow here, only the original calculation going through the order
of the range could have. This avoids the (as Jeremy points outs)
somewhat ugly NULL-based calculation here.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/irq: Alter the locking to use a mutex instead of a spinlock.

When we allocate/change the IRQ informations, we do not
need to use spinlocks. We can use a mutex (which is
what the generic IRQ code does for allocations/changes).
Fixes a slew of:

BUG: sleeping function called from invalid context at /linux/kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 3216, name: xenstored
2 locks held by xenstored/3216:
#0: (&u->bind_mutex){......}, at: [<ffffffffa02e0920>] evtchn_ioctl+0x30/0x3a0 [xen_evtchn]
#1: (irq_mapping_update_lock){......}, at: [<ffffffff8138b274>] bind_evtchn_to_irq+0x24/0x90
Pid: 3216, comm: xenstored Not tainted 3.1.0-rc6-00021-g437a3d1 #2
Call Trace:
[<ffffffff81088d10>] __might_sleep+0x100/0x130
[<ffffffff81645c2f>] mutex_lock_nested+0x2f/0x50
[<ffffffff81627529>] __irq_alloc_descs+0x49/0x200
[<ffffffffa02e0920>] ? evtchn_ioctl+0x30/0x3a0 [xen_evtchn]
[<ffffffff8138b214>] xen_allocate_irq_dynamic+0x34/0x70
[<ffffffff8138b2ad>] bind_evtchn_to_irq+0x5d/0x90
[<ffffffffa02e03c0>] ? evtchn_bind_to_user+0x60/0x60 [xen_evtchn]
[<ffffffff8138c282>] bind_evtchn_to_irqhandler+0x32/0x80
[<ffffffffa02e03a9>] evtchn_bind_to_user+0x49/0x60 [xen_evtchn]
[<ffffffffa02e0a34>] evtchn_ioctl+0x144/0x3a0 [xen_evtchn]
[<ffffffff811b4070>] ? vfsmount_lock_local_unlock+0x50/0x80
[<ffffffff811a6a1a>] do_vfs_ioctl+0x9a/0x5e0
[<ffffffff811b476f>] ? mntput+0x1f/0x30
[<ffffffff81196259>] ? fput+0x199/0x240
[<ffffffff811a7001>] sys_ioctl+0xa1/0xb0
[<ffffffff8164ea82>] system_call_fastpath+0x16/0x1b

Reported-by: Jim Burns <jim_burn@bellsouth.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/e820: if there is no dom0_mem=, don't tweak extra_pages.

The patch "xen: use maximum reservation to limit amount of usable RAM"
(d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
do not use 'dom0_mem=' argument with:

reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
(XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason being that the last E820 entry is created using the
'extra_pages' (which is based on how many pages have been freed).
The mentioned git commit sets the initial value of 'extra_pages'
using a hypercall which returns the number of pages (if dom0_mem
has been used) or -1 otherwise. If the later we return with
MAX_DOMAIN_PAGES as basis for calculation:

    return min(max_pages, MAX_DOMAIN_PAGES);

and use it:

     extra_limit = xen_get_max_pages();
     if (extra_limit >= max_pfn)
             extra_pages = extra_limit - max_pfn;
     else
             extra_pages = 0;

which means we end up with extra_pages = 128GB in PFNs (33554432)
- 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
and then we add that value to the E820 making it:

  Xen: 00000000ff000000 - 0000000100000000 (reserved)
  Xen: 0000000100000000 - 000000133f2e2000 (usable)

which is clearly wrong. It should look as so:

  Xen: 00000000ff000000 - 0000000100000000 (reserved)
  Xen: 0000000100000000 - 000000027fbda000 (usable)

Naturally this problem does not present itself if dom0_mem=max:X
is used.

CC: stable@kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Revert "xen/e820: if there is no dom0_mem=, don't tweak extra_pages."

This reverts commit 38ec5d3381179924085ac3b21caa8f27da9ba8b3.

We will use an version that went upstream.

ksplice: Clear garbage data on the kernel stack when handling signals.

- Add -u parameter to kernel_variant_post to make it work properly for uek [orabug 12965870]

radeon: add missed firmwares

This fixes orabug 12981553 for uek2(2.6.39).
mkdumprd need these files when generated dump initrd file.

Signed-off-by: Joe Jin <joe.jin@oracle.com>

generate -paravirt configs more accurate

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

SPEC: v2.6.39-100.0.7
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

CONFIG: Add support for Large files - 32bit orabug 12984979

Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

mpt2sas: Return the correct sense key for DIF errors

Only a target device should return ABORTED COMMAND when a PI error is
discovered. The HBA should always set the sense key to ILLEGAL REQUEST.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

mpt2sas: Add a module parameter that permits overriding protection capabilities

Add a parameter that allows the host protection capabilities mask to be
provided at module load time.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Btrfs: reserve sufficient space for ioctl clone

Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We
need to reserve space for:

- adjusting the old extent (possibly splitting it)
- adding the new extent
- updating the inode

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

fix --noarch build

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

SPECFILE: v2.6.39-100.0.6
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

x86, acpi: Handle xapic/x2apic entries in MADT at same time

One system have mixing xapic and x2apic entries in MADT and SRAT.
BIOS guys insist that ACPI 4.0 SPEC said so, if apic id < 255, even
the cpus are with x2apic mode pre-enabled, still need to use xapic entries
instead of x2apic entries.

on 8 socket system with x2apic pre-enabled, will get out of order sequence:
CPU0: socket0, core0, thread0.
CPU1 - CPU 40: socket 4 - socket 7, thread 0
CPU41 - CPU 80: socket 4 - socket 7, thread 1
CPU81 - CPU 119: socket 0 - socket 3, thread 0
CPU120 - CPU 159: socket 0 - socket 3, thread 1

so max_cpus=80 will not get all thread0 now.

Need to handle every entry in MADT at same time with xapic and x2apic.
so we can honor sequence in MADT.

We can use max_cpus= command line to use thread0 in every core,
because recent MADT always have all thread0 at first.
Also it could make the cpu to node mapping more sane.

after patch will get
CPU0 - CPU 79: socket 0 - socket 7, thread 0
CPU80 - CPU 159: socket 0 - socket 7, thread 1

-v2: update some comments, and change to pass array pointer.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

scsi: bump up SD_MAX_DISKS

SD_MAX_DISKS is arbitrarily limited to the number of scsi disks in the
namespace constructed of "sd" followed by one to three of the letters
a-z, or 18278 disks. There is no need for this limit, since appending
a fourth letter works perfectly fine. This simple patch just bumps the
number up to allow up to four letters after "sd". It might be best to
simply remove the test against SD_MAX_DISKS, but this is the patch that
has been unit tested.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

CONFIG: enable sysfs(el5) and xen memory hotplug
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Btrfs: don't change inode flag of the dest clone file

The dst file will have the same inode flags with dst file after
file clone, and I think it's unexpected.

For example, the dst file will suddenly become immutable after
getting some share of data with src file, if the src is immutable.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: don't make a file partly checksummed through file clone

To reproduce the bug:

  # mount /dev/sda7 /mnt
  # dd if=/dev/zero of=/mnt/src bs=4K count=1
  # umount /mnt

  # mount -o nodatasum /dev/sda7 /mnt
  # dd if=/dev/zero of=/mnt/dst bs=4K count=1
  # clone_range -s 4K -l 4K /mnt/src /mnt/dst

  # echo 3 > /proc/sys/vm/drop_caches
  # cat /mnt/dst
  # dmesg
  ...
  btrfs no csum found for inode 258 start 0
  btrfs csum failed ino 258 off 0 csum 2566472073 private 0

It's because part of the file is checksummed and the other part is not,
and then btrfs will complain checksum is not found when we read the file.

Disallow file clone if src and dst file have different checksum flag,
so we ensure a file is completely checksummed or unchecksummed.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix pages truncation in btrfs_ioctl_clone()

It's a bug in commit f81c9cdc567cd3160ff9e64868d9a1a7ee226480
(Btrfs: truncate pages from clone ioctl target range)

We should pass the dest range to the truncate function, but not the
src range.

Also move the function before locking extent state.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: fix d_off in the first dirent

Since the d_off in the first dirent for "." (that originates from
the 4th argument "offset" of filldir() for the 2nd dirent for "..")
is wrongly assigned in btrfs_real_readdir(), telldir returns same
offset for different locations.

| # mkfs.btrfs /dev/sdb1
| # mount /dev/sdb1 fs0
| # cd fs0
| # touch file0 file1
| # ../test
| telldir: 0
| readdir: d_off = 2, d_name = "."
| telldir: 2
| readdir: d_off = 2, d_name = ".."
| telldir: 2
| readdir: d_off = 3, d_name = "file0"
| telldir: 3
| readdir: d_off = 2147483647, d_name = "file1"
| telldir: 2147483647

To fix this problem, pass filp->f_pos (which is loff_t) instead.

| # ../test
| telldir: 0
| readdir: d_off = 1, d_name = "."
| telldir: 1
| readdir: d_off = 2, d_name = ".."
| telldir: 2
| readdir: d_off = 3, d_name = "file0"
:

At the moment the "offset" for "." is unused because there is no
preceding dirent, however it is better to pass filp->f_pos to follow
grammatical usage.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Turn on CONFIG_CRYPTO_FIPS

Oraclebug: 12989580
CRYPTO_FIPS depends on CRYPTO_ANSI_CPRNG && !CRYPTO_MANAGER_DISABLE_TESTS.
Turn it on in kernel config.

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

config-debug: enable LOCKDEP and more debug options

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

make XEN_MAX_DOMAIN_MEMORY selectable

Fix arch/x86/xen/Kconfig XEN_MAX_DOMAIN_MEMORY selection option.
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

Specfile: build OCFS2

Remove commented out patches from spec

For now we build directly from git tree. Remove commented out patches
from spec file to make it more readable.

Build paravirt and paravirt-debug kernels

Merge branch '3.0.4v3' of /home/muvarov/GIT/linux-uek-2.6.39 into 2.6.39-100.0.5_pulltest

SPEC: v2.6.39-100.0.5 xen/config/tmem support
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

config: Enable PARAVIRT and Xen options
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

Revert "ipc semaphores: reduce ipc_lock contention in semtimedop"

This reverts commit c7fa322dd72b08450a440ef800124705a1fa148c.

Revert "ipc semaphores: order wakeups based on waiter CPU"

This reverts commit 8102e1ff9d667661b581209323faaf7a84f0f528.

Revert "use rwlocks for ipc"

This reverts commit 78fe45325c8e2e3f4b6ebb1ee15b6c2e8af5ddb1.

Revert "IPC lock reduction corners"

This reverts commit 8385de45ab8e4b40eaf8341f599bf0c19b08bb64.

Revert "IPC reduce lock contention in semctl"

This reverts commit a8fc9c3f989c474f44e6d4b4f126961207261a1e.

Merge branch 'in-3.1/bug.fixes' of git://oss.oracle.com/git/kwilk/xen into uek2-stable

config: from 6.1 and review
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

xen/e820: if there is no dom0_mem=, don't tweak extra_pages.

The patch "xen: use maximum reservation to limit amount of usable RAM"
(d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
do not use 'dom0_mem=' argument with:

reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
(XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason being that the last E820 entry is created using the
'extra_pages' (which is based on how many pages have been freed).
The mentioned git commit sets the initial value of 'extra_pages'
using a hypercall which returns the number of pages (if dom0_mem
has been used) or -1 otherwise. If the later we return with
MAX_DOMAIN_PAGES as basis for calculation:

    return min(max_pages, MAX_DOMAIN_PAGES);

and use it:

     extra_limit = xen_get_max_pages();
     if (extra_limit >= max_pfn)
             extra_pages = extra_limit - max_pfn;
     else
             extra_pages = 0;

which means we end up with extra_pages = 128GB in PFNs (33554432)
- 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
and then we add that value to the E820 making it:

  Xen: 00000000ff000000 - 0000000100000000 (reserved)
  Xen: 0000000100000000 - 000000133f2e2000 (usable)

which is clearly wrong. It should look as so:

  Xen: 00000000ff000000 - 0000000100000000 (reserved)
  Xen: 0000000100000000 - 000000027fbda000 (usable)

Naturally this problem does not present itself if dom0_mem=max:X
is used.

CC: stable@kernel.org
CC: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: disable PV spinlocks on HVM

PV spinlocks cannot possibly work with the current code because they are
enabled after pvops patching has already been done, and because PV
spinlocks use a different data structure than native spinlocks so we
cannot switch between them dynamically. A spinlock that has been taken
once by the native code (__ticket_spin_lock) cannot be taken by
__xen_spin_lock even after it has been released.

Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead.

We have hit a couple of customer bugs where they would like to
use those parameters to run an UP kernel - but both of those
options turn of important sources of interrupt information so
we end up not being able to boot. The correct way is to
pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and
the kernel will patch itself to be a UP kernel.

Fixes bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308

CC: stable@kernel.org
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: x86_32: do not enable iterrupts when returning from exception in interrupt context

If vmalloc page_fault happens inside of interrupt handler with interrupts
disabled then on exit path from exception handler when there is no pending
interrupts, the following code (arch/x86/xen/xen-asm_32.S:112):

cmpw $0x0001, XEN_vcpu_info_pending(%eax)
sete XEN_vcpu_info_mask(%eax)

will enable interrupts even if they has been previously disabled according to
eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99)

testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp)
setz XEN_vcpu_info_mask(%eax)

Solution is in setting XEN_vcpu_info_mask only when it should be set
according to
cmpw $0x0001, XEN_vcpu_info_pending(%eax)
but not clearing it if there isn't any pending events.

Reproducer for bug is attached to RHBZ 707552

CC: stable@kernel.org
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: use maximum reservation to limit amount of usable RAM

Use the domain's maximum reservation to limit the amount of extra RAM
for the memory balloon. This reduces the size of the pages tables and
the amount of reserved low memory (which defaults to about 1/32 of the
total RAM).

On a system with 8 GiB of RAM with the domain limited to 1 GiB the
kernel reports:

Before:

Memory: 627792k/4472000k available

After:

Memory: 549740k/11132224k available

A increase of about 76 MiB (~1.5% of the unused 7 GiB). The reserved
low memory is also reduced from 253 MiB to 32 MiB. The total
additional usable RAM is 329 MiB.

For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit
the number of pages for dom0') (c/s 23790)

CC: stable@kernel.org
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

SCSI: Fix oops dereferencing queue

Commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b introduced a regression
where requests could be queued after a device had disappeared.
Subsequent commits have attempted to fix some but not all of these
issues.

Since there appears to be ongoing discussion about the proper way to fix
this we'll partially revert the upstream commit.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: guru.anbalagane <guru.anbalagane@oracle.com>

Btrfs: add dummy extent if dst offset excceeds file end in

You can see there's no file extent with range [0, 4096]. Check this by
btrfsck:

# btrfsck /dev/sda7
root 5 inode 258 errors 100
...

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: calc file extent num_bytes correctly in file clone

num_bytes should be 4096 not 12288.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: xattr: fix attribute removal

An attribute is not removed by 'setfattr -x attr file' and remains
visible in attr list. This makes xfstests/062 pass again.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix wrong nbytes information of the inode

If we write some data into the data hole of the file(no preallocation for this
hole), Btrfs will allocate some disk space, and update nbytes of the inode, but
the other element--disk_i_size needn't be updated. At this condition, we must
update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size()
return 1).

# mkfs.btrfs /dev/sdb1
# mount /dev/sdb1 /mnt
# touch /mnt/a
# truncate -s 856002 /mnt/a
# dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc
# umount /mnt
# btrfsck /dev/sdb1
root 5 inode 257 errors 400
found 32768 bytes used err is 1

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix the file extent gap when doing direct IO

When we write some data to the place that is beyond the end of the file
in direct I/O mode, a data hole will be created. And Btrfs should insert
a file extent item that point to this hole into the fs tree. But unfortunately
Btrfs forgets doing it.

The following is a simple way to reproduce it:
# mkfs.btrfs /dev/sdc2
# mount /dev/sdc2 /test4
# touch /test4/a
# dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
# umount /test4
# btrfsck /dev/sdc2
root 5 inode 257 errors 100

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix unclosed transaction handle in btrfs_cont_expand

The function - btrfs_cont_expand() forgot to close the transaction handle before
it jump out the while loop. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix misuse of trans block rsv

At the beginning of create_pending_snapshot, trans->block_rsv is set
to pending->block_rsv and is used for snapshot things, however, when
it is done, we do not recover it as will.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: reset to appropriate block rsv after orphan operations

While truncating free space cache, we forget to change trans->block_rsv
back to the original one, but leave it with the orphan_block_rsv, and
then with option inode_cache enable, it leads to countless warnings of
btrfs_alloc_free_block and btrfs_orphan_commit_root:

WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]()
...
WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: skip locking if searching the commit root in csum lookup

It's not enough to just search the commit root, since we could be cow'ing the
very block we need to search through, which would mean that its locked and we'll
still deadlock. So use path->skip_locking as well. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: fix warning in iput for bad-inode

iput() shouldn't be called for inodes in I_NEW state.
We need to mark inode as constructed first.

WARNING: at fs/inode.c:1309 iput+0x20b/0x210()
Call Trace:
[<ffffffff8103e7ba>] warn_slowpath_common+0x7a/0xb0
[<ffffffff8103e805>] warn_slowpath_null+0x15/0x20
[<ffffffff810eaf0b>] iput+0x20b/0x210
[<ffffffff811b96fb>] btrfs_iget+0x1eb/0x4a0
[<ffffffff811c3ad6>] btrfs_run_defrag_inodes+0x136/0x210
[<ffffffff811ad55f>] cleaner_kthread+0x17f/0x1a0
[<ffffffff81035b7d>] ? sub_preempt_count+0x9d/0xd0
[<ffffffff811ad3e0>] ? transaction_kthread+0x280/0x280
[<ffffffff8105af86>] kthread+0x96/0xa0
[<ffffffff814336d4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105aef0>] ? kthread_worker_fn+0x190/0x190
[<ffffffff814336d0>] ? gs_change+0xb/0xb

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
Tested-by: David Sterba <dsterba@suse.cz>
CC: Josef Bacik <josef@redhat.com>
CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix an oops when deleting snapshots

We can reproduce this oops via the following steps:

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs
$ for ((i=0; i<3; i++)); do btrfs sub snap /mnt/btrfs /mnt/btrfs/s_$i; done
$ rm -fr /mnt/btrfs/*
$ rm -fr /mnt/btrfs/*

then we'll get
------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:2264!
[...]
Call Trace:
[<ffffffffa05578c7>] btrfs_rmdir+0xf7/0x1b0 [btrfs]
[<ffffffff81150b95>] vfs_rmdir+0xa5/0xf0
[<ffffffff81153cc3>] do_rmdir+0x123/0x140
[<ffffffff81145ac7>] ? fput+0x197/0x260
[<ffffffff810aecff>] ? audit_syscall_entry+0x1bf/0x1f0
[<ffffffff81153d0d>] sys_unlinkat+0x2d/0x40
[<ffffffff8147896b>] system_call_fastpath+0x16/0x1b
RIP [<ffffffffa054f7b9>] btrfs_orphan_add+0x179/0x1a0 [btrfs]

When it comes to btrfs_lookup_dentry, we may set a snapshot's inode->i_ino
to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID,
while the snapshot's location.objectid remains unchanged.

However, btrfs_ino() does not take this into account, and returns a wrong ino,
and causes the oops.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Merge branch 'in-3.1/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-stable

Merge branch 'stable/tracing' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-stable

Merge branch 'stable/xen-pciback-0.6.3.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-stable

Merge branches 'stable/drivers.other' and 'stable/drivers.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-stable

Merge branch 'stable/pci.cleanups.v1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-stable

Merge branch 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into uek2-stable

Merge branch 'for-guru' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem into uek2-stable

xen-blkback: fixed indentation and comments

This patch fixes belows:

1. Fix code style issue.
2. Fix incorrect functions name in comments.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/blkback: Make description more obvious.

With the frontend having Xen but the backend not, it just looks odd:

<*> Xen virtual block device support
<*> Block-device backend driver

Fix it to have the 'Xen' in front of it.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-blkfront: Drop name and minor adjustments for emulated scsi devices

These were intended to avoid the namespace clash when representing
emulated IDE and SCSI devices. However that seems to confuse users
more than expected (a disk defined as sda becomes xvde).
So for now go back to the scheme which does no adjustments. This
will break when mixing IDE and SCSI names in the configuration of
guests but should be by now expected.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen-blkfront: Fix one off warning about name clash

Avoid telling users to use xvde and onwards when using xvde.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: Do not enable PV IPIs when vector callback not present

Fix regression for HVM case on older (<4.1.1) hypervisors caused by

  commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
  Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
  Date:   Thu Dec 2 17:55:10 2010 +0000

    xen: PV on HVM: support PV spinlocks and IPIs

This change replaced the SMP operations with event based handlers without
taking into account that this only works when the hypervisor supports
callback vectors. This causes unexplainable hangs early on boot for
HVM guests with more than one CPU.

BugLink: http://bugs.launchpad.net/bugs/791850
CC: stable@kernel.org
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-and-Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/x86: replace order-based range checking of M2P table by linear one

The order-based approach is not only less efficient (requiring a shift
and a compare, typical generated code looking like this

mov eax, [machine_to_phys_order]
mov ecx, eax
shr ebx, cl
test ebx, ebx
jnz ...

whereas a direct check requires just a compare, like in

cmp ebx, [machine_to_phys_nr]
jae ...

), but also slightly dangerous in the 32-on-64 case - the element
address calculation can wrap if the next power of two boundary is
sufficiently far away from the actual upper limit of the table, and
hence can result in user space addresses being accessed (with it being
unknown what may actually be mapped there).

Additionally, the elimination of the mistaken use of fls() here (should
have been __fls()) fixes a latent issue on x86-64 that would trigger
if the code was run on a system with memory extending beyond the 44-bit
boundary.

CC: stable@kernel.org
Signed-off-by: Jan Beulich <jbeulich@novell.com>
[v1: Based on Jeremy's feedback]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: Fix misleading WARN message at xen_release_chunk

WARN message should not complain
"Failed to release memory %lx-%lx err=%d\n"
^^^^^^^
about range when it fails to release just one page,
instead it should say what pfn is not freed.

In addition line:
printk(KERN_INFO "xen_release_chunk: looking at area pfn %lx-%lx: "
...
printk(KERN_CONT "%lu pages freed\n", len);
will be broken if WARN in between this line is fired. So fix it
by using a single printk for this.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: Fix printk() format in xen/setup.c

Use correct format specifier for unsigned long.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/grant: Fix compile warning.

drivers/xen/grant-table.c:85: warning: ‘rc’ may be used uninitialized in this function

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: xen-selfballoon.c needs more header files

Fix build errors (found when CONFIG_SYSFS is not enabled):

drivers/xen/xen-selfballoon.c:446: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:446: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
drivers/xen/xen-selfballoon.c:446: warning: parameter names (without types) in function declaration
drivers/xen/xen-selfballoon.c:485: error: expected declaration specifiers or '...' before string constant
drivers/xen/xen-selfballoon.c:485: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:485: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/xen/xen-selfballoon.c:485: warning: function declaration isn't a prototype

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/self-balloon: Add dependency on tmem.

Without enabling CONFIG_XEN_TMEM we get this:

drivers/xen/xen-selfballoon.c:461: undefined reference to `tmem_enabled'

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/balloon: Fix compile errors - missing header files.

With a specific enough .config file compile errors show
for missing workqueue declarations.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: convert to 64 bit stats interface

Convert xen driver to 64 bit statistics interface.
Use stats_sync to ensure that 64 bit update is read atomically on 32 bit platform.
Put hot statistics into per-cpu table.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen/netback: Add module alias for autoloading

Add xen-backend:vif module alias to the xen-netback module. This allows
automatic loading of the module.

Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/blkback: Don't let in-flight requests defer pending ones.

Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
idea. It means that in-flight I/O is essentially blocking continued
batches. This essentially kills throughput on frontends which unplug
(or even just notify) early and rightfully assume addtional requests
will be picked up on time, not synchronously.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
[v1: Rebased and fixed compile problems]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/blkback: Add module alias for autoloading

Add xen-backend:vbd module alias to the xen-blkback module. This allows
automatic loading of the module.

Signed-off-by: Bastian Blank <waldi@debian.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

mm: extend memory hotplug API to allow memory hotplug in virtual machines

This patch contains online_page_callback and apropriate functions for
registering/unregistering online page callbacks.  It allows to do some
machine specific tasks during online page stage which is required to
implement memory hotplug in virtual machines.  Currently this patch is
required by latest memory hotplug support for Xen balloon driver patch
which will be posted soon.

Additionally, originial online_page() function was splited into
following functions doing "atomic" operations:

  - __online_page_set_limits() - set new limits for memory management code,
  - __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
  - __online_page_free() - free page to allocator.

It was done to:
  - not duplicate existing code,
  - ease hotplug code devolpment by usage of well defined interface,
  - avoid stupid bugs which are unavoidable when the same code
    (by design) is developed in many places.

[akpm@linux-foundation.org: use explicit indirect-call syntax]
Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>