[PATCH] audit: dynamically allocate audit_names when not enough spaceis in the names array
Some system calls, such as loading a module, can cause filesystem activity
such as creating lots of new files in /sys or debugfs. Audit can only capture
20 inodes in a single syscall since it stores them in a fixed length array.
This patch shrinks the fixed length array to 5 names (the number used by a
rename operation) and then allocates names dynamically as they are needed
after 5. It should shut up the complaints we sometimes see about
'name_count maxed, losing inode data" such as that seen in Red Hat bugzilla
445757.
Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>
Dan Magenheimer [Tue, 27 Sep 2011 14:47:58 +0000 (08:47 -0600)]
xen: Fix selfballooning and ensure it doesn't go too far
The balloon driver's "current_pages" is very different from
totalram_pages. Self-ballooning needs to be driven by
the latter. Also, Committed_AS doesn't account for pages
used by the kernel so enforce a floor for when there are little
or no user-space threads using memory. The floor function
includes a "safety margin" tuneable in case we discover later
that the floor function is still too aggressive in some workloads,
though likely it will not be needed.
Changes since version 1:
- tuneable safety margin added
[v2: konrad.wilk@oracle.com: make safety margin tuneable]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
When do testing with BCM5704 Gigabit Ethernet, bootup get below warning:
tg3 0000:03:01.0: eth0: DMA Status error. Resetting chip.
<Registers state of device>
tg3 0000:03:01.0: eth0: 0: Host status block [00000007:00000002:(0000:0000:0000):(0000:0000)]
tg3 0000:03:01.0: eth0: 0: NAPI info [00000001:00000002:(0000:0000:01ff):0000:(00c8:0000:0000:0000)]
tg3 0000:03:01.0: eth0: Link is up at 1000 Mbps, full duplex
tg3 0000:03:01.0: eth0: Flow control is on for TX and on for RX
tg3 0000:03:01.0: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3 0000:03:01.0: eth0: Link is down
tg3 0000:03:01.0: eth0: Link is up at 1000 Mbps, full duplex
tg3 0000:03:01.0: eth0: Flow control is on for TX and on for RX
We need not to check the status and dump registers status if the device
not ready,
Signed-off-by: Joe Jin <joe.jin@oracle.com> Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
The problem is that in copy_page_range we turn lazy mode on, and then
in swap_entry_free we call swap_count_continued which ends up in:
map = kmap_atomic(page, KM_USER0) + offset;
and then later we touch *map.
Since we are running in batched mode (lazy) we don't actually set up the
PTE mappings and the kmap_atomic is not done synchronously and ends up
trying to dereference a page that has not been set.
Looking at kmap_atomic_prot_pfn, it uses 'arch_flush_lazy_mmu_mode' and
doing the same in kmap_atomic_prot and __kunmap_atomic makes the problem
go away.
Interestingly, git commit b8bcfe997e46150fedcc3f5b26b846400122fdd9
removed part of this to fix an interrupt issue - but it went to far
and did not consider this scenario.
CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: x86@kernel.org CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: stable@kernel.org
[v1: Redid the commit description per Jeremy's apt suggestion] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Martin K. Petersen [Fri, 23 Sep 2011 00:54:11 +0000 (20:54 -0400)]
mpt2sas: Do not check DIF for unwritten blocks
Blocks that have never been written contain all ones in the protection
information. When reading, a target device will disable checking for any
block whose application tag contains 0xFFFF.
Some MPT adapters, however, enable checking for all blocks by
default. This causes reads to previously unwritten sectors to fail.
Tweak the relevant bit in the MPT configuration page that causes the HBA
to disable PI checking when a received app tag contains 0xFFFF.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jan Beulich [Thu, 15 Sep 2011 07:52:40 +0000 (08:52 +0100)]
xen/i386: follow-up to "replace order-based range checking of M2P table by linear one"
The numbers obtained from the hypervisor really can't ever lead to an
overflow here, only the original calculation going through the order
of the range could have. This avoids the (as Jeremy points outs)
somewhat ugly NULL-based calculation here.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/irq: Alter the locking to use a mutex instead of a spinlock.
When we allocate/change the IRQ informations, we do not
need to use spinlocks. We can use a mutex (which is
what the generic IRQ code does for allocations/changes).
Fixes a slew of:
Reported-by: Jim Burns <jim_burn@bellsouth.net> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Tue, 13 Sep 2011 14:17:32 +0000 (10:17 -0400)]
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
The patch "xen: use maximum reservation to limit amount of usable RAM"
(d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
do not use 'dom0_mem=' argument with:
reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
(XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...
The reason being that the last E820 entry is created using the
'extra_pages' (which is based on how many pages have been freed).
The mentioned git commit sets the initial value of 'extra_pages'
using a hypercall which returns the number of pages (if dom0_mem
has been used) or -1 otherwise. If the later we return with
MAX_DOMAIN_PAGES as basis for calculation:
which means we end up with extra_pages = 128GB in PFNs (33554432)
- 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
and then we add that value to the E820 making it:
Yinghai Lu [Thu, 8 Sep 2011 18:33:22 +0000 (11:33 -0700)]
x86, acpi: Handle xapic/x2apic entries in MADT at same time
One system have mixing xapic and x2apic entries in MADT and SRAT.
BIOS guys insist that ACPI 4.0 SPEC said so, if apic id < 255, even
the cpus are with x2apic mode pre-enabled, still need to use xapic entries
instead of x2apic entries.
on 8 socket system with x2apic pre-enabled, will get out of order sequence:
CPU0: socket0, core0, thread0.
CPU1 - CPU 40: socket 4 - socket 7, thread 0
CPU41 - CPU 80: socket 4 - socket 7, thread 1
CPU81 - CPU 119: socket 0 - socket 3, thread 0
CPU120 - CPU 159: socket 0 - socket 3, thread 1
so max_cpus=80 will not get all thread0 now.
Need to handle every entry in MADT at same time with xapic and x2apic.
so we can honor sequence in MADT.
We can use max_cpus= command line to use thread0 in every core,
because recent MADT always have all thread0 at first.
Also it could make the cpu to node mapping more sane.
after patch will get
CPU0 - CPU 79: socket 0 - socket 7, thread 0
CPU80 - CPU 159: socket 0 - socket 7, thread 1
-v2: update some comments, and change to pass array pointer.
Dave Kleikamp [Wed, 14 Sep 2011 21:48:47 +0000 (16:48 -0500)]
scsi: bump up SD_MAX_DISKS
SD_MAX_DISKS is arbitrarily limited to the number of scsi disks in the
namespace constructed of "sd" followed by one to three of the letters
a-z, or 18278 disks. There is no need for this limit, since appending
a fourth letter works perfectly fine. This simple patch just bumps the
number up to allow up to four letters after "sd". It might be best to
simply remove the test against SD_MAX_DISKS, but this is the patch that
has been unit tested.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Since the d_off in the first dirent for "." (that originates from
the 4th argument "offset" of filldir() for the 2nd dirent for "..")
is wrongly assigned in btrfs_real_readdir(), telldir returns same
offset for different locations.
At the moment the "offset" for "." is unused because there is no
preceding dirent, however it is better to pass filp->f_pos to follow
grammatical usage.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
The patch "xen: use maximum reservation to limit amount of usable RAM"
(d312ae878b6aed3912e1acaaf5d0b2a9d08a4f11) breaks machines that
do not use 'dom0_mem=' argument with:
reserve RAM buffer: 000000133f2e2000 - 000000133fffffff
(XEN) mm.c:4976:d0 Global bit is set to kernel page fffff8117e
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...
The reason being that the last E820 entry is created using the
'extra_pages' (which is based on how many pages have been freed).
The mentioned git commit sets the initial value of 'extra_pages'
using a hypercall which returns the number of pages (if dom0_mem
has been used) or -1 otherwise. If the later we return with
MAX_DOMAIN_PAGES as basis for calculation:
which means we end up with extra_pages = 128GB in PFNs (33554432)
- 8GB in PFNs (2097152, on this specific box, can be larger or smaller),
and then we add that value to the E820 making it:
PV spinlocks cannot possibly work with the current code because they are
enabled after pvops patching has already been done, and because PV
spinlocks use a different data structure than native spinlocks so we
cannot switch between them dynamically. A spinlock that has been taken
once by the native code (__ticket_spin_lock) cannot be taken by
__xen_spin_lock even after it has been released.
Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead.
We have hit a couple of customer bugs where they would like to
use those parameters to run an UP kernel - but both of those
options turn of important sources of interrupt information so
we end up not being able to boot. The correct way is to
pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and
the kernel will patch itself to be a UP kernel.
Igor Mammedov [Thu, 1 Sep 2011 11:46:55 +0000 (13:46 +0200)]
xen: x86_32: do not enable iterrupts when returning from exception in interrupt context
If vmalloc page_fault happens inside of interrupt handler with interrupts
disabled then on exit path from exception handler when there is no pending
interrupts, the following code (arch/x86/xen/xen-asm_32.S:112):
Solution is in setting XEN_vcpu_info_mask only when it should be set
according to
cmpw $0x0001, XEN_vcpu_info_pending(%eax)
but not clearing it if there isn't any pending events.
Reproducer for bug is attached to RHBZ 707552
CC: stable@kernel.org Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Vrabel [Fri, 19 Aug 2011 14:57:16 +0000 (15:57 +0100)]
xen: use maximum reservation to limit amount of usable RAM
Use the domain's maximum reservation to limit the amount of extra RAM
for the memory balloon. This reduces the size of the pages tables and
the amount of reserved low memory (which defaults to about 1/32 of the
total RAM).
On a system with 8 GiB of RAM with the domain limited to 1 GiB the
kernel reports:
Before:
Memory: 627792k/4472000k available
After:
Memory: 549740k/11132224k available
A increase of about 76 MiB (~1.5% of the unused 7 GiB). The reserved
low memory is also reduced from 253 MiB to 32 MiB. The total
additional usable RAM is 329 MiB.
For dom0, this requires at patch to Xen ('x86: use 'dom0_mem' to limit
the number of pages for dom0') (c/s 23790)
CC: stable@kernel.org Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Martin K. Petersen [Fri, 22 Jul 2011 15:59:17 +0000 (08:59 -0700)]
SCSI: Fix oops dereferencing queue
Commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b introduced a regression
where requests could be queued after a device had disappeared.
Subsequent commits have attempted to fix some but not all of these
issues.
Since there appears to be ongoing discussion about the proper way to fix
this we'll partially revert the upstream commit.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: guru.anbalagane <guru.anbalagane@oracle.com>
If we write some data into the data hole of the file(no preallocation for this
hole), Btrfs will allocate some disk space, and update nbytes of the inode, but
the other element--disk_i_size needn't be updated. At this condition, we must
update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size()
return 1).
Btrfs: fix the file extent gap when doing direct IO
When we write some data to the place that is beyond the end of the file
in direct I/O mode, a data hole will be created. And Btrfs should insert
a file extent item that point to this hole into the fs tree. But unfortunately
Btrfs forgets doing it.
The following is a simple way to reproduce it:
# mkfs.btrfs /dev/sdc2
# mount /dev/sdc2 /test4
# touch /test4/a
# dd if=/dev/zero of=/test4/a seek=8 count=1 bs=4K oflag=direct conv=nocreat,notrunc
# umount /test4
# btrfsck /dev/sdc2
root 5 inode 257 errors 100
Liu Bo [Sun, 11 Sep 2011 14:52:24 +0000 (10:52 -0400)]
Btrfs: fix misuse of trans block rsv
At the beginning of create_pending_snapshot, trans->block_rsv is set
to pending->block_rsv and is used for snapshot things, however, when
it is done, we do not recover it as will.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Liu Bo [Sun, 11 Sep 2011 14:52:24 +0000 (10:52 -0400)]
Btrfs: reset to appropriate block rsv after orphan operations
While truncating free space cache, we forget to change trans->block_rsv
back to the original one, but leave it with the orphan_block_rsv, and
then with option inode_cache enable, it leads to countless warnings of
btrfs_alloc_free_block and btrfs_orphan_commit_root:
WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]()
...
WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]()
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Sun, 11 Sep 2011 14:52:24 +0000 (10:52 -0400)]
Btrfs: skip locking if searching the commit root in csum lookup
It's not enough to just search the commit root, since we could be cow'ing the
very block we need to search through, which would mean that its locked and we'll
still deadlock. So use path->skip_locking as well. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> CC: Konstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: David Sterba <dsterba@suse.cz> CC: Josef Bacik <josef@redhat.com> CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
When it comes to btrfs_lookup_dentry, we may set a snapshot's inode->i_ino
to BTRFS_EMPTY_SUBVOL_DIR_OBJECTID instead of BTRFS_FIRST_FREE_OBJECTID,
while the snapshot's location.objectid remains unchanged.
However, btrfs_ino() does not take this into account, and returns a wrong ino,
and causes the oops.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Joe Jin [Mon, 15 Aug 2011 04:57:07 +0000 (12:57 +0800)]
xen-blkback: fixed indentation and comments
This patch fixes belows:
1. Fix code style issue.
2. Fix incorrect functions name in comments.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefan Bader [Thu, 14 Jul 2011 13:30:22 +0000 (15:30 +0200)]
xen-blkfront: Drop name and minor adjustments for emulated scsi devices
These were intended to avoid the namespace clash when representing
emulated IDE and SCSI devices. However that seems to confuse users
more than expected (a disk defined as sda becomes xvde).
So for now go back to the scheme which does no adjustments. This
will break when mixing IDE and SCSI names in the configuration of
guests but should be by now expected.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This change replaced the SMP operations with event based handlers without
taking into account that this only works when the hypervisor supports
callback vectors. This causes unexplainable hangs early on boot for
HVM guests with more than one CPU.
BugLink: http://bugs.launchpad.net/bugs/791850 CC: stable@kernel.org Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-and-Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
whereas a direct check requires just a compare, like in
cmp ebx, [machine_to_phys_nr]
jae ...
), but also slightly dangerous in the 32-on-64 case - the element
address calculation can wrap if the next power of two boundary is
sufficiently far away from the actual upper limit of the table, and
hence can result in user space addresses being accessed (with it being
unknown what may actually be mapped there).
Additionally, the elimination of the mistaken use of fls() here (should
have been __fls()) fixes a latent issue on x86-64 that would trigger
if the code was run on a system with memory extending beyond the 44-bit
boundary.
CC: stable@kernel.org Signed-off-by: Jan Beulich <jbeulich@novell.com>
[v1: Based on Jeremy's feedback] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Igor Mammedov [Tue, 2 Aug 2011 09:45:25 +0000 (11:45 +0200)]
xen: Fix misleading WARN message at xen_release_chunk
WARN message should not complain
"Failed to release memory %lx-%lx err=%d\n"
^^^^^^^
about range when it fails to release just one page,
instead it should say what pfn is not freed.
In addition line:
printk(KERN_INFO "xen_release_chunk: looking at area pfn %lx-%lx: "
...
printk(KERN_CONT "%lu pages freed\n", len);
will be broken if WARN in between this line is fired. So fix it
by using a single printk for this.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Randy Dunlap [Wed, 10 Aug 2011 18:22:42 +0000 (11:22 -0700)]
xen: xen-selfballoon.c needs more header files
Fix build errors (found when CONFIG_SYSFS is not enabled):
drivers/xen/xen-selfballoon.c:446: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:446: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
drivers/xen/xen-selfballoon.c:446: warning: parameter names (without types) in function declaration
drivers/xen/xen-selfballoon.c:485: error: expected declaration specifiers or '...' before string constant
drivers/xen/xen-selfballoon.c:485: warning: data definition has no type or storage class
drivers/xen/xen-selfballoon.c:485: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/xen/xen-selfballoon.c:485: warning: function declaration isn't a prototype
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
stephen hemminger [Tue, 21 Jun 2011 05:35:31 +0000 (05:35 +0000)]
xen: convert to 64 bit stats interface
Convert xen driver to 64 bit statistics interface.
Use stats_sync to ensure that 64 bit update is read atomically on 32 bit platform.
Put hot statistics into per-cpu table.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Stodden [Sat, 28 May 2011 20:21:10 +0000 (13:21 -0700)]
xen/blkback: Don't let in-flight requests defer pending ones.
Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad
idea. It means that in-flight I/O is essentially blocking continued
batches. This essentially kills throughput on frontends which unplug
(or even just notify) early and rightfully assume addtional requests
will be picked up on time, not synchronously.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
[v1: Rebased and fixed compile problems] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Daniel Kiper [Tue, 26 Jul 2011 00:12:05 +0000 (17:12 -0700)]
mm: extend memory hotplug API to allow memory hotplug in virtual machines
This patch contains online_page_callback and apropriate functions for
registering/unregistering online page callbacks. It allows to do some
machine specific tasks during online page stage which is required to
implement memory hotplug in virtual machines. Currently this patch is
required by latest memory hotplug support for Xen balloon driver patch
which will be posted soon.
Additionally, originial online_page() function was splited into
following functions doing "atomic" operations:
- __online_page_set_limits() - set new limits for memory management code,
- __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
- __online_page_free() - free page to allocator.
It was done to:
- not duplicate existing code,
- ease hotplug code devolpment by usage of well defined interface,
- avoid stupid bugs which are unavoidable when the same code
(by design) is developed in many places.
[akpm@linux-foundation.org: use explicit indirect-call syntax] Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>