www.infradead.org Git - users/jedix/linux-maple.git/log

]> www.infradead.org Git - users/jedix/linux-maple.git/log

projects / users / jedix / linux-maple.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Lukas Czerner [Mon, 20 Feb 2012 22:53:00 +0000 (17:53 -0500)]

ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delalloc

commit 3d2b158262826e8b75bbbfb7b97010838dd92ac7 upstream.

Ext4 does not support data journalling with delayed allocation enabled.
We even do not allow to mount the file system with delayed allocation
and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
system mounted with delayed allocation (default) and that's where
problem arises. The easies way to reproduce this problem is with the
following set of commands:

mkfs.ext4 /dev/sdd
mount /dev/sdd /mnt/test1
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
chattr +j /mnt/test1/file
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
chattr -j /mnt/test1/file

Additionally it can be reproduced quite reliably with xfstests 272 and
269. In fact the above reproducer is a part of test 272.

To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
the file system is mounted with delayed allocation. This can be easily
done by fixing ext4_should_*_data() functions do ignore data journal
flag when delalloc is set (suggested by Ted). We also have to set the
appropriate address space operations for the inode (again, ignoring data
journal flag if delalloc enabled).

Additionally this commit introduces ext4_inode_journal_mode() function
because ext4_should_*_data() has already had a lot of common code and
this change is putting it all into one function so it is easier to
read.

Successfully tested with xfstests in following configurations:

delalloc + data=ordered
delalloc + data=writeback
data=journal
nodelalloc + data=ordered
nodelalloc + data=writeback
nodelalloc + data=journal

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Eric Sandeen [Mon, 20 Feb 2012 22:53:01 +0000 (17:53 -0500)]

jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

commit 15291164b22a357cb211b618adfef4fa82fc0de3 upstream.

journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.

This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one. Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.

Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb
and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go
away, because buried within that large change is some more flag
clearing. I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Jeff Kirsher [Fri, 18 Nov 2011 14:25:00 +0000 (14:25 +0000)]

e1000e: Avoid wrong check on TX hang

commit 09357b00255c233705b1cf6d76a8d147340545b8 upstream.

Based on the original patch submitted my Michael Wang
<wangyun@linux.vnet.ibm.com>.
Descriptors may not be write-back while checking TX hang with flag
FLAG2_DMA_BURST on.
So when we detect hang, we just flush the descriptor and detect
again for once.

-v2 change 1 to true and 0 to false and remove extra ()

CC: Michael Wang <wangyun@linux.vnet.ibm.com>
CC: Flavio Leitner <fbl@redhat.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Andreas Herrmann [Fri, 23 Mar 2012 09:02:17 +0000 (10:02 +0100)]

hwmon: (fam15h_power) Correct sign extension of running_avg_capture

commit fc0900cbda9243957d812cd6b4cc87965f9fe75f upstream.

Wrong bit was used for sign extension which caused wrong end results.
Thanks to Andre for spotting this bug.

Reported-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Pravin B Shelar [Fri, 23 Mar 2012 22:02:55 +0000 (15:02 -0700)]

proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().

commit 1b26c9b334044cff6d1d2698f2be41bc7d9a0864 upstream.

The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace.  Keeping that network namespace from being freed
when the last user goes away.  Leaving things like vlan devices in the
leaked network namespace.

If you use ip netns add for much real work this problem becomes apparent
pretty quickly.  It light testing the problem hides because frequently
you simply don't notice the leak.

Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.

This issue exists back to 3.0.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dmitry Adamushko [Thu, 22 Mar 2012 20:39:25 +0000 (21:39 +0100)]

x86-32: Fix endless loop when processing signals for kernel tasks

commit 29a2e2836ff9ea65a603c89df217f4198973a74f upstream.

The problem occurs on !CONFIG_VM86 kernels [1] when a kernel-mode task
returns from a system call with a pending signal.

A real-life scenario is a child of 'khelper' returning from a failed
kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ].
kernel_execve() fails due to a pending SIGKILL, which is the result of
"kill -9 -1" (at least, busybox's init does it upon reboot).

The loop is as follows:

* syscall_exit_work:
- work_pending:            // start_of_the_loop
- work_notify_sig:
   - do_notify_resume()
     - do_signal()
       - if (!user_mode(regs)) return;
- resume_userspace         // TIF_SIGPENDING is still set
- work_pending             // so we call work_pending => goto
                            // start_of_the_loop

More information can be found in another LKML thread:
http://www.serverphorums.com/read.php?12,457826

[1] the problem was also seen on MIPS.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Link: http://lkml.kernel.org/r/1332448765.2299.68.camel@dimm
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

tom.leiming@gmail.com [Thu, 22 Mar 2012 03:22:38 +0000 (03:22 +0000)]

usbnet: don't clear urb->dev in tx_complete

commit 5d5440a835710d09f0ef18da5000541ec98b537a upstream.

URB unlinking is always racing with its completion and tx_complete
may be called before or during running usb_unlink_urb, so tx_complete
must not clear urb->dev since it will be used in unlink path,
otherwise invalid memory accesses or usb device leak may be caused
inside usb_unlink_urb.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Trond Myklebust [Mon, 19 Mar 2012 17:39:35 +0000 (13:39 -0400)]

SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()

commit 540a0f7584169651f485e8ab67461fcb06934e38 upstream.

The problem is that for the case of priority queues, we
have to assume that __rpc_remove_wait_queue_priority will move new
elements from the tk_wait.links lists into the queue->tasks[] list.
We therefore cannot use list_for_each_entry_safe() on queue->tasks[],
since that will skip these new tasks that __rpc_remove_wait_queue_priority
is adding.

Without this fix, rpc_wake_up and rpc_wake_up_status will both fail
to wake up all functions on priority wait queues, which can result
in some nasty hangs.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Jeff Layton [Wed, 21 Mar 2012 10:30:40 +0000 (06:30 -0400)]

cifs: fix issue mounting of DFS ROOT when redirecting from one domain controller to the next

commit 1daaae8fa4afe3df78ca34e724ed7e8187e4eb32 upstream.

This patch fixes an issue when cifs_mount receives a
STATUS_BAD_NETWORK_NAME error during cifs_get_tcon but is able to
continue after an DFS ROOT referral. In this case, the return code
variable is not reset prior to trying to mount from the system referred
to. Thus, is_path_accessible is not executed and the final DFS referral
is not performed causing a mount error.

Use case: In DNS, example.com resolves to the secondary AD server
ad2.example.com Our primary domain controller is ad1.example.com and has
a DFS redirection set up from \\ad1\share\Users to \\files\share\Users.
Mounting \\example.com\share\Users fails.

Regression introduced by commit 724d9f1.

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru
Signed-off-by: Thomas Hadig <thomas@intapp.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dave Chinner [Wed, 7 Mar 2012 04:50:25 +0000 (04:50 +0000)]

xfs: fix inode lookup race

commit f30d500f809eca67a21704347ab14bb35877b5ee upstream.

When we get concurrent lookups of the same inode that is not in the
per-AG inode cache, there is a race condition that triggers warnings
in unlock_new_inode() indicating that we are initialising an inode
that isn't in a the correct state for a new inode.

When we do an inode lookup via a file handle or a bulkstat, we don't
serialise lookups at a higher level through the dentry cache (i.e.
pathless lookup), and so we can get concurrent lookups of the same
inode.

The race condition is between the insertion of the inode into the
cache in the case of a cache miss and a concurrently lookup:

Thread 1 Thread 2
xfs_iget()
  xfs_iget_cache_miss()
    xfs_iread()
    lock radix tree
    radix_tree_insert()
rcu_read_lock
radix_tree_lookup
lock inode flags
XFS_INEW not set
igrab()
unlock inode flags
rcu_read_unlock
use uninitialised inode
.....
    lock inode flags
    set XFS_INEW
    unlock inode flags
    unlock radix tree
  xfs_setup_inode()
    inode flags = I_NEW
    unlock_new_inode()
      WARNING as inode flags != I_NEW

This can lead to inode corruption, inode list corruption, etc, and
is generally a bad thing to occur.

Fix this by setting XFS_INEW before inserting the inode into the
radix tree. This will ensure any concurrent lookup will find the new
inode with XFS_INEW set and that forces the lookup to wait until the
XFS_INEW flag is removed before allowing the lookup to succeed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Clemens Ladisch [Mon, 12 Mar 2012 20:45:47 +0000 (21:45 +0100)]

firewire: ohci: fix too-early completion of IR multichannel buffers

commit 0c0efbacab8d70700d13301e0ae7975783c0cb0a upstream.

handle_ir_buffer_fill() assumed that a completed descriptor would be
indicated by a non-zero transfer_status (as in most other descriptors).
However, this field is written by the controller as soon as (the end of)
the first packet has been written into the buffer. As a consequence, if
we happen to run into such a descriptor when the interrupt handler is
executed after such a packet has completed, the descriptor would be
taken out of the list of active descriptors as soon as the buffer had
been partially filled, so the event for the buffer being completely
filled would never be sent.

To fix this, handle descriptors only when they have been completely
filled, i.e., when res_count == 0. (This also matches the condition
that is reported by the controller with an interrupt.)

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Sergei Shtylyov [Thu, 19 Jan 2012 16:09:56 +0000 (19:09 +0300)]

pata_legacy: correctly mask recovery field for HT6560B

commit 9716387311c790de381214c03e7f1b72b91a8189 upstream.

According to the HT6560H datasheet, the recovery timing field is 4-bit wide,
with a value of 0 meaning 16 cycles. Correct obvious thinko in the recovery
field mask.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Roland Dreier [Fri, 24 Feb 2012 01:22:12 +0000 (17:22 -0800)]

target: Fix 16-bit target ports for SET TARGET PORT GROUPS emulation

commit 33395fb8a13731c7ef7b175dbf5a4d8a6738fe6c upstream.

The old code did (MSB << 8) & 0xff, which always evaluates to 0. Just use
get_unaligned_be16() so we don't have to worry about whether our open-coded
version is correct or not.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Roland Dreier [Tue, 14 Feb 2012 00:18:16 +0000 (16:18 -0800)]

target: Don't set WBUS16 or SYNC bits in INQUIRY response

commit effc6cc8828257c32c37635e737f14fd6e19ecd7 upstream.

SPC-4 says about the WBUS16 and SYNC bits:

    The meanings of these fields are specific to SPI-5 (see 6.4.3).
    For SCSI transport protocols other than the SCSI Parallel
    Interface, these fields are reserved.

We don't have a SPI fabric module, so we should never set these bits.
(The comment was misleading, since it only mentioned Sync but the
actual code set WBUS16 too).

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

NeilBrown [Mon, 19 Mar 2012 01:46:38 +0000 (12:46 +1100)]

md/raid1,raid10: avoid deadlock during resync/recovery.

commit d6b42dcb995e6acd7cc276774e751ffc9f0ef4bf upstream.

If RAID1 or RAID10 is used under LVM or some other stacking
block device, it is possible to enter a deadlock during
resync or recovery.
This can happen if the upper level block device creates
two requests to the RAID1 or RAID10. The first request gets
processed, blocks recovery and queue requests for underlying
requests in current->bio_list. A resync request then starts
which will wait for those requests and block new IO.

But then the second request to the RAID1/10 will be attempted
and it cannot progress until the resync request completes,
which cannot progress until the underlying device requests complete,
which are on a queue behind that second request.

So allow that second request to proceed even though there is
a resync request about to start.

This is suitable for any -stable kernel.

Reported-by: Ray Morris <support@bettercgi.com>
Tested-by: Ray Morris <support@bettercgi.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

NeilBrown [Mon, 19 Mar 2012 01:46:37 +0000 (12:46 +1100)]

md/bitmap: ensure to load bitmap when creating via sysfs.

commit 4474ca42e2577563a919fd3ed782e2ec55bf11a2 upstream.

When commit 69e51b449d383e (md/bitmap: separate out loading a bitmap...)
created bitmap_load, it missed calling it after bitmap_create when a
bitmap is created through the sysfs interface.
So if a bitmap is added this way, we don't allocate memory properly
and can crash.

This is suitable for any -stable release since 2.6.35.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Nicholas Bellinger [Sat, 10 Mar 2012 07:45:38 +0000 (23:45 -0800)]

tcm_fc: Fix fc_exch memory leak in ft_send_resp_status

commit 031ed4d565b31880a4136bb7366bc89f5b1dba7d upstream.

This patch fixes a bug in tcm_fc where fc_exch memory from fc_exch_mgr->ep_pool
is currently being leaked by ft_send_resp_status() usage. Following current
code in ft_queue_status() response path, using lport->tt.seq_send() needs to be
followed by a lport->tt.exch_done() in order to release fc_exch memory back into
libfc_em kmem_cache.

ft_send_resp_status() code is currently used in pre submit se_cmd ft_send_work()
error exceptions, TM request setup exceptions, and main TM response callback
path in ft_queue_tm_resp(). This bugfix addresses the leak in these cases.

Cc: Mark D Rustad <mark.d.rustad@intel.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Aneesh Kumar K.V [Wed, 21 Mar 2012 23:34:08 +0000 (16:34 -0700)]

hugetlbfs: avoid taking i_mutex from hugetlbfs_read()

commit a05b0855fd15504972dba2358e5faa172a1e50ba upstream.

Taking i_mutex in hugetlbfs_read() can result in deadlock with mmap as
explained below

Thread A:
  read() on hugetlbfs
   hugetlbfs_read() called
    i_mutex grabbed
     hugetlbfs_read_actor() called
      __copy_to_user() called
       page fault is triggered
Thread B, sharing address space with A:
  mmap() the same file
   ->mmap_sem is grabbed on task_B->mm->mmap_sem
    hugetlbfs_file_mmap() is called
     attempt to grab ->i_mutex and block waiting for A to give it up
Thread A:
  pagefault handled blocked on attempt to grab task_A->mm->mmap_sem,
which happens to be the same thing as task_B->mm->mmap_sem.  Block waiting
for B to give it up.

AFAIU the i_mutex locking was added to hugetlbfs_read() as per
http://lkml.indiana.edu/hypermail/linux/kernel/0707.2/3066.html to take
care of the race between truncate and read.  This patch fixes this by
looking at page->mapping under lock_page() (find_lock_page()) to ensure
that the inode didn't get truncated in the range during a parallel read.

Ideally we can extend the patch to make sure we don't increase i_size in
mmap.  But that will break userspace, because applications will now have
to use truncate(2) to increase i_size in hugetlbfs.

Based on the original patch from Hillf Danton.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Nishanth Aravamudan [Wed, 21 Mar 2012 23:34:07 +0000 (16:34 -0700)]

bootmem/sparsemem: remove limit constraint in alloc_bootmem_section

commit f5bf18fa22f8c41a13eb8762c7373eb3a93a7333 upstream.

While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
Overcommit) on powerpc, we tripped the following:

  kernel BUG at mm/bootmem.c:483!
  cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
      pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
      lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
      sp: c000000000c03bc0
     msr: 8000000000021032
    current = 0xc000000000b0cce0
    paca    = 0xc000000001d80000
      pid   = 0, comm = swapper
  kernel BUG at mm/bootmem.c:483!
  enter ? for help
  [c000000000c03c80] c000000000a64bcc
  .sparse_early_usemaps_alloc_node+0x84/0x29c
  [c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
  [c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
  [c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
  [c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c

This is

        BUG_ON(limit && goal + size > limit);

and after some debugging, it seems that

goal = 0x7ffff000000
limit = 0x80000000000

and sparse_early_usemaps_alloc_node ->
sparse_early_usemaps_alloc_pgdat_section calls

return alloc_bootmem_section(usemap_size() * count, section_nr);

This is on a system with 8TB available via the AMS pool, and as a quirk
of AMS in firmware, all of that memory shows up in node 0.  So, we end
up with an allocation that will fail the goal/limit constraints.

In theory, we could "fall-back" to alloc_bootmem_node() in
sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
defined, we'll BUG_ON() instead.  A simple solution appears to be to
unconditionally remove the limit condition in alloc_bootmem_section,
meaning allocations are allowed to cross section boundaries (necessary
for systems of this size).

Johannes Weiner pointed out that if alloc_bootmem_section() no longer
guarantees section-locality, we need check_usemap_section_nr() to print
possible cross-dependencies between node descriptors and the usemaps
allocated through it.  That makes the two loops in
sparse_early_usemaps_alloc_node() identical, so re-factor the code a
bit.

[akpm@linux-foundation.org: code simplification]
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Andrea Arcangeli [Wed, 21 Mar 2012 23:33:42 +0000 (16:33 -0700)]

mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode

commit 1a5a9906d4e8d1976b701f889d8f35d54b928f25 upstream.

In some cases it may happen that pmd_none_or_clear_bad() is called with
the mmap_sem hold in read mode.  In those cases the huge page faults can
allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
false positive from pmd_bad() that will not like to see a pmd
materializing as trans huge.

It's not khugepaged causing the problem, khugepaged holds the mmap_sem
in write mode (and all those sites must hold the mmap_sem in read mode
to prevent pagetables to go away from under them, during code review it
seems vm86 mode on 32bit kernels requires that too unless it's
restricted to 1 thread per process or UP builds).  The race is only with
the huge pagefaults that can convert a pmd_none() into a
pmd_trans_huge().

Effectively all these pmd_none_or_clear_bad() sites running with
mmap_sem in read mode are somewhat speculative with the page faults, and
the result is always undefined when they run simultaneously.  This is
probably why it wasn't common to run into this.  For example if the
madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
fault, the hugepage will not be zapped, if the page fault runs first it
will be zapped.

Altering pmd_bad() not to error out if it finds hugepmds won't be enough
to fix this, because zap_pmd_range would then proceed to call
zap_pte_range (which would be incorrect if the pmd become a
pmd_trans_huge()).

The simplest way to fix this is to read the pmd in the local stack
(regardless of what we read, no need of actual CPU barriers, only
compiler barrier needed), and be sure it is not changing under the code
that computes its value.  Even if the real pmd is changing under the
value we hold on the stack, we don't care.  If we actually end up in
zap_pte_range it means the pmd was not none already and it was not huge,
and it can't become huge from under us (khugepaged locking explained
above).

All we need is to enforce that there is no way anymore that in a code
path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
can run into a hugepmd.  The overhead of a barrier() is just a compiler
tweak and should not be measurable (I only added it for THP builds).  I
don't exclude different compiler versions may have prevented the race
too by caching the value of *pmd on the stack (that hasn't been
verified, but it wouldn't be impossible considering
pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
and there's no external function called in between pmd_trans_huge and
pmd_none_or_clear_bad).

if (pmd_trans_huge(*pmd)) {
if (next-addr != HPAGE_PMD_SIZE) {
VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
split_huge_page_pmd(vma->vm_mm, pmd);
} else if (zap_huge_pmd(tlb, vma, pmd, addr))
continue;
/* fall through */
}
if (pmd_none_or_clear_bad(pmd))

Because this race condition could be exercised without special
privileges this was reported in CVE-2012-1179.

The race was identified and fully explained by Ulrich who debugged it.
I'm quoting his accurate explanation below, for reference.

====== start quote =======
      mapcount 0 page_mapcount 1
      kernel BUG at mm/huge_memory.c:1384!

    At some point prior to the panic, a "bad pmd ..." message similar to the
    following is logged on the console:

      mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).

    The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
    the page's PMD table entry.

        143 void pmd_clear_bad(pmd_t *pmd)
        144 {
    ->  145         pmd_ERROR(*pmd);
        146         pmd_clear(pmd);
        147 }

    After the PMD table entry has been cleared, there is an inconsistency
    between the actual number of PMD table entries that are mapping the page
    and the page's map count (_mapcount field in struct page). When the page
    is subsequently reclaimed, __split_huge_page() detects this inconsistency.

       1381         if (mapcount != page_mapcount(page))
       1382                 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
       1383                        mapcount, page_mapcount(page));
    -> 1384         BUG_ON(mapcount != page_mapcount(page));

    The root cause of the problem is a race of two threads in a multithreaded
    process. Thread B incurs a page fault on a virtual address that has never
    been accessed (PMD entry is zero) while Thread A is executing an madvise()
    system call on a virtual address within the same 2 MB (huge page) range.

               virtual address space
              .---------------------.
              |                     |
              |                     |
            .-|---------------------|
            | |                     |
            | |                     |<-- B(fault)
            | |                     |
      2 MB  | |/////////////////////|-.
      huge <  |/////////////////////|  > A(range)
      page  | |/////////////////////|-'
            | |                     |
            | |                     |
            '-|---------------------|
              |                     |
              |                     |
              '---------------------'

    - Thread A is executing an madvise(..., MADV_DONTNEED) system call
      on the virtual address range "A(range)" shown in the picture.

    sys_madvise
      // Acquire the semaphore in shared mode.
      down_read(&current->mm->mmap_sem)
      ...
      madvise_vma
        switch (behavior)
        case MADV_DONTNEED:
             madvise_dontneed
               zap_page_range
                 unmap_vmas
                   unmap_page_range
                     zap_pud_range
                       zap_pmd_range
                         //
                         // Assume that this huge page has never been accessed.
                         // I.e. content of the PMD entry is zero (not mapped).
                         //
                         if (pmd_trans_huge(*pmd)) {
                             // We don't get here due to the above assumption.
                         }
                         //
                         // Assume that Thread B incurred a page fault and
             .---------> // sneaks in here as shown below.
             |           //
             |           if (pmd_none_or_clear_bad(pmd))
             |               {
             |                 if (unlikely(pmd_bad(*pmd)))
             |                     pmd_clear_bad
             |                     {
             |                       pmd_ERROR
             |                         // Log "bad pmd ..." message here.
             |                       pmd_clear
             |                         // Clear the page's PMD entry.
             |                         // Thread B incremented the map count
             |                         // in page_add_new_anon_rmap(), but
             |                         // now the page is no longer mapped
             |                         // by a PMD entry (-> inconsistency).
             |                     }
             |               }
             |
             v
    - Thread B is handling a page fault on virtual address "B(fault)" shown
      in the picture.

    ...
    do_page_fault
      __do_page_fault
        // Acquire the semaphore in shared mode.
        down_read_trylock(&mm->mmap_sem)
        ...
        handle_mm_fault
          if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
              // We get here due to the above assumption (PMD entry is zero).
              do_huge_pmd_anonymous_page
                alloc_hugepage_vma
                  // Allocate a new transparent huge page here.
                ...
                __do_huge_pmd_anonymous_page
                  ...
                  spin_lock(&mm->page_table_lock)
                  ...
                  page_add_new_anon_rmap
                    // Here we increment the page's map count (starts at -1).
                    atomic_set(&page->_mapcount, 0)
                  set_pmd_at
                    // Here we set the page's PMD entry which will be cleared
                    // when Thread A calls pmd_clear_bad().
                  ...
                  spin_unlock(&mm->page_table_lock)

    The mmap_sem does not prevent the race because both threads are acquiring
    it in shared mode (down_read).  Thread B holds the page_table_lock while
    the page's map count and PMD table entry are updated.  However, Thread A
    does not synchronize on that lock.

====== end quote =======

[akpm@linux-foundation.org: checkpatch fixes]
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Suresh Siddha [Mon, 12 Mar 2012 18:36:33 +0000 (11:36 -0700)]

x86/ioapic: Add register level checks to detect bogus io-apic entries

commit 73d63d038ee9f769f5e5b46792d227fe20e442c5 upstream.

With the recent changes to clear_IO_APIC_pin() which tries to
clear remoteIRR bit explicitly, some of the users started to see
"Unable to reset IRR for apic .." messages.

Close look shows that these are related to bogus IO-APIC entries
which return's all 1's for their io-apic registers. And the
above mentioned error messages are benign. But kernel should
have ignored such io-apic's in the first place.

Check if register 0, 1, 2 of the listed io-apic are all 1's and
ignore such io-apic.

Reported-by: Álvaro Castillo <midgoon@gmail.com>
Tested-by: Jon Dufresne <jon@jondufresne.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: kernel-team@fedoraproject.org
Cc: Josh Boyer <jwboyer@redhat.com>
Link: http://lkml.kernel.org/r/1331577393.31585.94.camel@sbsiddha-desk.sc.intel.com
[ Performed minor cleanup of affected code. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Rabin Vincent [Tue, 22 Nov 2011 10:03:14 +0000 (11:03 +0100)]

rtc: Disable the alarm in the hardware (v2)

commit 41c7f7424259ff11009449f87c95656f69f9b186 upstream.

Currently, the RTC code does not disable the alarm in the hardware.

This means that after a sequence such as the one below (the files are in the
RTC sysfs), the box will boot up after 2 minutes even though we've
asked for the alarm to be turned off.

# echo $((`cat since_epoch`)+120) > wakealarm
# echo 0 > wakealarm
# poweroff

Fix this by disabling the alarm when there are no timers to run.

The original version of this patch was reverted. This version
disables the irq directly instead of setting a disabled timer
in the future.

Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
[Merged in the second revision from Rabin]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Alexander Gordeev [Fri, 9 Mar 2012 13:59:13 +0000 (14:59 +0100)]

genirq: Fix incorrect check for forced IRQ thread handler

commit 540b60e24f3f4781d80e47122f0c4486a03375b8 upstream.

We do not want a bitwise AND between boolean operands

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20120309135912.GA2114@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Russell King [Mon, 5 Mar 2012 23:07:25 +0000 (15:07 -0800)]

genirq: Fix long-term regression in genirq irq_set_irq_type() handling

commit a09b659cd68c10ec6a30cb91ebd2c327fcd5bfe5 upstream.

In 2008, commit 0c5d1eb77a8be ("genirq: record trigger type") modified the
way set_irq_type() handles the 'no trigger' condition.  However, this has
an adverse effect on PCMCIA support on Intel StrongARM and probably PXA
platforms.

PCMCIA has several status signals on the socket which can trigger
interrupts; some of these status signals depend on the card's mode
(whether it is configured in memory or IO mode).  For example, cards have
a 'Ready/IRQ' signal: in memory mode, this provides an indication to
PCMCIA that the card has finished its power up initialization.  In IO
mode, it provides the device interrupt signal.  Other status signals
switch between on-board battery status and loud speaker output.

In classical PCMCIA implementations, where you have a specific socket
controller, the controller provides a method to mask interrupts from the
socket, and importantly ignore any state transitions on the pins which
correspond with interrupts once masked.  This masking prevents unwanted
events caused by the removal and application of socket power being
forwarded.

However, on platforms where there is no socket controller, the PCMCIA
status and interrupt signals are routed to standard edge-triggered GPIOs.
These GPIOs can be configured to interrupt on rising edge, falling edge,
or never.  This is where the problems start.

Edge triggered interrupts are required to record events while disabled via
the usual methods of {free,request,disable,enable}_irq() to prevent
problems with dropped interrupts (eg, the 8390 driver uses disable_irq()
to defer the delivery of interrupts).  As a result, these interfaces can
not be used to implement the desired behaviour.

The side effect of this is that if the 'Ready/IRQ' GPIO is disabled via
disable_irq() on suspend, and enabled via enable_irq() after resume, we
will record the state transitions caused by powering events as valid
interrupts, and foward them to the card driver, which may attempt to
access a card which is not powered up.

This leads delays resume while drivers spin in their interrupt handlers,
and complaints from drivers before they realize what's happened.

Moreover, in the case of the 'Ready/IRQ' signal, this is requested and
freed by the card driver itself; the PCMCIA core has no idea whether the
interrupt is requested, and, therefore, whether a call to disable_irq()
would be valid.  (We tried this around 2.4.17 / 2.5.1 kernel era, and
ended up throwing it out because of this problem.)

Therefore, it was decided back in around 2002 to disable the edge
triggering instead, resulting in all state transitions on the GPIO being
ignored.  That's what we actually need the hardware to do.

The commit above changes this behaviour; it explicitly prevents the 'no
trigger' state being selected.

The reason that request_irq() does not accept the 'no trigger' state is
for compatibility with existing drivers which do not provide their desired
triggering configuration.  The set_irq_type() function is 'new' and not
used by non-trigger aware drivers.

Therefore, revert this change, and restore previously working platforms
back to their former state.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: linux@arm.linux.org.uk
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Andrew Vagin [Wed, 7 Mar 2012 10:49:56 +0000 (14:49 +0400)]

uevent: send events in correct order according to seqnum (v3)

commit 7b60a18da393ed70db043a777fd9e6d5363077c4 upstream.

The queue handling in the udev daemon assumes that the events are
ordered.

Before this patch uevent_seqnum is incremented under sequence_lock,
than an event is send uner uevent_sock_mutex. I want to say that code
contained a window between incrementing seqnum and sending an event.

This patch locks uevent_sock_mutex before incrementing uevent_seqnum.

v2: delete sequence_lock, uevent_seqnum is protected by uevent_sock_mutex
v3: unlock the mutex before the goto exit

Thanks for Kay for the comments.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Tested-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Sasha Levin [Thu, 15 Mar 2012 16:36:14 +0000 (12:36 -0400)]

ntp: Fix integer overflow when setting time

commit a078c6d0e6288fad6d83fb6d5edd91ddb7b6ab33 upstream.

'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.

Use div64_long() instead.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Sasha Levin [Thu, 15 Mar 2012 16:36:13 +0000 (12:36 -0400)]

math: Introduce div64_long

commit f910381a55cdaa097030291f272f6e6e4380c39a upstream.

Add a div64_long macro which is used to devide a 64bit number by a long (which
can be 4 bytes on 32bit systems and 8 bytes on 64bit systems).

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-1-git-send-email-levinsasha928@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Masami Ichikawa [Mon, 20 Feb 2012 22:43:50 +0000 (07:43 +0900)]

sysfs: Fix memory leak in sysfs_sd_setsecdata().

commit 93518dd2ebafcc761a8637b2877008cfd748c202 upstream.

This patch fixies follwing two memory leak patterns that reported by kmemleak.
sysfs_sd_setsecdata() is called during sys_lsetxattr() operation.
It checks sd->s_iattr is NULL or not. Then if it is NULL, it calls
sysfs_init_inode_attrs() to allocate memory.
That code is this.

iattrs = sd->s_iattr;
if (!iattrs)
                iattrs = sysfs_init_inode_attrs(sd);

The iattrs recieves sysfs_init_inode_attrs()'s result,  but sd->s_iattr
doesn't know the address. so it needs to set correct address to
sd->s_iattr to free memory in other function.

unreferenced object 0xffff880250b73e60 (size 32):
  comm "systemd", pid 1, jiffies 4294683888 (age 94.553s)
  hex dump (first 32 bytes):
    73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f  system_u:object_
    72 3a 73 79 73 66 73 5f 74 3a 73 30 00 00 00 00  r:sysfs_t:s0....
  backtrace:
    [<ffffffff814cb1d0>] kmemleak_alloc+0x73/0x98
    [<ffffffff811270ab>] __kmalloc+0x100/0x12c
    [<ffffffff8120775a>] context_struct_to_string+0x106/0x210
    [<ffffffff81207cc1>] security_sid_to_context_core+0x10b/0x129
    [<ffffffff812090ef>] security_sid_to_context+0x10/0x12
    [<ffffffff811fb0da>] selinux_inode_getsecurity+0x7d/0xa8
    [<ffffffff811fb127>] selinux_inode_getsecctx+0x22/0x2e
    [<ffffffff811f4d62>] security_inode_getsecctx+0x16/0x18
    [<ffffffff81191dad>] sysfs_setxattr+0x96/0x117
    [<ffffffff811542f0>] __vfs_setxattr_noperm+0x73/0xd9
    [<ffffffff811543d9>] vfs_setxattr+0x83/0xa1
    [<ffffffff811544c6>] setxattr+0xcf/0x101
    [<ffffffff81154745>] sys_lsetxattr+0x6a/0x8f
    [<ffffffff814efda9>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88024163c5a0 (size 96):
  comm "systemd", pid 1, jiffies 4294683888 (age 94.553s)
  hex dump (first 32 bytes):
    00 00 00 00 ed 41 00 00 00 00 00 00 00 00 00 00  .....A..........
    00 00 00 00 00 00 00 00 0c 64 42 4f 00 00 00 00  .........dBO....
  backtrace:
    [<ffffffff814cb1d0>] kmemleak_alloc+0x73/0x98
    [<ffffffff81127402>] kmem_cache_alloc_trace+0xc4/0xee
    [<ffffffff81191cbe>] sysfs_init_inode_attrs+0x2a/0x83
    [<ffffffff81191dd6>] sysfs_setxattr+0xbf/0x117
    [<ffffffff811542f0>] __vfs_setxattr_noperm+0x73/0xd9
    [<ffffffff811543d9>] vfs_setxattr+0x83/0xa1
    [<ffffffff811544c6>] setxattr+0xcf/0x101
    [<ffffffff81154745>] sys_lsetxattr+0x6a/0x8f
    [<ffffffff814efda9>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff
`

Signed-off-by: Masami Ichikawa <masami256@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Thomas Gleixner [Wed, 15 Feb 2012 11:08:34 +0000 (12:08 +0100)]

futex: Cover all PI opcodes with cmpxchg enabled check

commit 59263b513c11398cd66a52d4c5b2b118ce1e0359 upstream.

Some of the newer futex PI opcodes do not check the cmpxchg enabled
variable and call unconditionally into the handling functions. Cover
all PI opcodes in a separate check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Supriya Karanth [Fri, 17 Feb 2012 09:24:52 +0000 (14:54 +0530)]

usb: musb: Reselect index reg in interrupt context

commit 39287076e46d2c19aaceaa6f0a44168ae4d257ec upstream.

musb INDEX register is getting modified/corrupted during temporary
un-locking in a SMP system. Set this register with proper value
after re-acquiring the lock

Scenario:
---------
CPU1 is handling a data transfer completion interrupt received for
the CLASS1 EP
CPU2 is handling a CLASS2 thread which is queuing data to musb for
transfer

Below is the error sequence:

         CPU1                   |             CPU2
--------------------------------------------------------------------
Data transfer completion inter- |
rupt recieved.                  |
                                |
musb INDEX reg set to CLASS1 EP |
                                |
musb LOCK is acquired.          |
                                |
                                | CLASS2 thread queues data.
                                |
                                | CLASS2 thread tries to acquire musb
                                | LOCK but lock is already taken by
                                | CLASS1, so CLASS2 thread is
                                | spinning.
                                |
From Interrupt Context musb     |
giveback function is called     |
                                |
The giveback function releases  | CLASS2 thread now acquires LOCK
LOCK                            |
                                |
ClASS1 Request's completion cal-| ClASS2 schedules the data transfer and
lback is called                 | sets the MUSB INDEX to Class2 EP number
                                |
Interrupt handler for CLASS1 EP |
tries to acquire LOCK and is    |
spinning                        |
                                |
Interrupt for Class1 EP acquires| Class2 completes the scheduling etc and
the MUSB LOCK                   | releases the musb LOCK
                                |
Interrupt for Class1 EP schedul-|
es the next data transfer       |
but musb INDEX register is still|
set to CLASS2 EP                |

Since the MUSB INDEX register is set to a different endpoint, we
read and modify the wrong registers. Hence data transfer will not
happen properly. This results in unpredictable behavior

So, the MUSB INDEX register is set to proper value again when
interrupt re-acquires the lock

Signed-off-by: Supriya Karanth <supriya.karanth@stericsson.com>
Signed-off-by: Praveena Nadahally <praveen.nadahally@stericsson.com>
Reviewed-by: srinidhi kasagar <srinidhi.kasagar@stericsson.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>

commit | commitdiff | tree

Greg Kroah-Hartman [Tue, 28 Feb 2012 17:20:09 +0000 (09:20 -0800)]

USB: ftdi_sio: fix problem when the manufacture is a NULL string

commit 656d2b3964a9d0f9864d472f8dfa2dd7dd42e6c0 upstream.

On some misconfigured ftdi_sio devices, if the manufacturer string is
NULL, the kernel will oops when the device is plugged in. This patch
fixes the problem.

Reported-by: Wojciech M Zabolotny <W.Zabolotny@elka.pw.edu.pl>
Tested-by: Wojciech M Zabolotny <W.Zabolotny@elka.pw.edu.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Maxim Uvarov [Sat, 7 Apr 2012 00:08:37 +0000 (17:08 -0700)]

SPEC: v2.6.39-200.0.14

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Sat, 7 Apr 2012 00:04:55 +0000 (17:04 -0700)]

update kabi

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 5 Apr 2012 22:42:09 +0000 (15:42 -0700)]

adjust kernel configs

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 5 Apr 2012 21:36:39 +0000 (14:36 -0700)]

Merge branch 'loop' of git://ca-git.us.oracle.com/linux-dkleikam-public

commit | commitdiff | tree

Clemens Ladisch [Sat, 3 Dec 2011 22:41:31 +0000 (23:41 +0100)]

usb: fix number of mapped SG DMA entries

commit bc677d5b64644c399cd3db6a905453e611f402ab upstream.

Add a new field num_mapped_sgs to struct urb so that we have a place to
store the number of mapped entries and can also retain the original
value of entries in num_sgs.  Previously, usb_hcd_map_urb_for_dma()
would overwrite this with the number of mapped entries, which would
break dma_unmap_sg() because it requires the original number of entries.

This fixes warnings like the following when using USB storage devices:
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:902 check_unmap+0x4e4/0x695()
ehci_hcd 0000:00:12.2: DMA-API: device driver frees DMA sg list with different entry count [map count=4] [unmap count=1]
Modules linked in: ohci_hcd ehci_hcd
Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #319
Call Trace:
  <IRQ>  [<ffffffff81036d3b>] warn_slowpath_common+0x80/0x98
  [<ffffffff81036de7>] warn_slowpath_fmt+0x41/0x43
  [<ffffffff811fa5ae>] check_unmap+0x4e4/0x695
  [<ffffffff8105e92c>] ? trace_hardirqs_off+0xd/0xf
  [<ffffffff8147208b>] ? _raw_spin_unlock_irqrestore+0x33/0x50
  [<ffffffff811fa84a>] debug_dma_unmap_sg+0xeb/0x117
  [<ffffffff8137b02f>] usb_hcd_unmap_urb_for_dma+0x71/0x188
  [<ffffffff8137b166>] unmap_urb_for_dma+0x20/0x22
  [<ffffffff8137b1c5>] usb_hcd_giveback_urb+0x5d/0xc0
  [<ffffffffa0000d02>] ehci_urb_done+0xf7/0x10c [ehci_hcd]
  [<ffffffffa0001140>] qh_completions+0x429/0x4bd [ehci_hcd]
  [<ffffffffa000340a>] ehci_work+0x95/0x9c0 [ehci_hcd]
  ...
---[ end trace f29ac88a5a48c580 ]---
Mapped at:
  [<ffffffff811faac4>] debug_dma_map_sg+0x45/0x139
  [<ffffffff8137bc0b>] usb_hcd_map_urb_for_dma+0x22e/0x478
  [<ffffffff8137c494>] usb_hcd_submit_urb+0x63f/0x6fa
  [<ffffffff8137d01c>] usb_submit_urb+0x2c7/0x2de
  [<ffffffff8137dcd4>] usb_sg_wait+0x55/0x161

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit | commitdiff | tree

J. Bruce Fields [Tue, 29 Nov 2011 16:35:35 +0000 (11:35 -0500)]

svcrpc: destroy server sockets all at once

commit 2fefb8a09e7ed251ae8996e0c69066e74c5aa560 upstream.

There's no reason I can see that we need to call sv_shutdown between
closing the two lists of sockets.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit | commitdiff | tree

Matthew Garrett [Thu, 10 Nov 2011 21:38:33 +0000 (16:38 -0500)]

PCI: Rework ASPM disable code

commit 3c076351c4027a56d5005a39a0b518a4ba393ce2 upstream.

Right now we forcibly clear ASPM state on all devices if the BIOS indicates
that the feature isn't supported. Based on the Microsoft presentation
"PCI Express In Depth for Windows Vista and Beyond", I'm starting to think
that this may be an error. The implication is that unless the platform
grants full control via _OSC, Windows will not touch any PCIe features -
including ASPM. In that case clearing ASPM state would be an error unless
the platform has granted us that control.

This patch reworks the ASPM disabling code such that the actual clearing
of state is triggered by a successful handoff of PCIe control to the OS.
The general ASPM code undergoes some changes in order to ensure that the
ability to clear the bits isn't overridden by ASPM having already been
disabled. Further, this theoretically now allows for situations where
only a subset of PCIe roots hand over control, leaving the others in the
BIOS state.

It's difficult to know for sure that this is the right thing to do -
there's zero public documentation on the interaction between all of these
components. But enough vendors enable ASPM on platforms and then set this
bit that it seems likely that they're expecting the OS to leave them alone.

Measured to save around 5W on an idle Thinkpad X220.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Eric Dumazet [Thu, 9 Feb 2012 21:13:19 +0000 (16:13 -0500)]

net: fix NULL dereferences in check_peer_redir()

[ Upstream commit d3aaeb38c40e5a6c08dd31a1b64da65c4352be36, along
  with dependent backports of commits:
     69cce1d1404968f78b177a0314f5822d5afdbbfb
     9de79c127cccecb11ae6a21ab1499e87aa222880
     218fa90f072e4aeff9003d57e390857f4f35513e
     580da35a31f91a594f3090b7a2c39b85cb051a12
     f7e57044eeb1841847c24aa06766c8290c202583
     e049f28883126c689cf95859480d9ee4ab23b7fa ]

Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde778 (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Wu Fengguang [Mon, 9 Jan 2012 17:53:50 +0000 (11:53 -0600)]

lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernel

commit 3310225dfc71a35a2cc9340c15c0e08b14b3c754 upstream.

PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs
in the below lines of bdi_dirty_limit():

bdi_dirty *= numerator;
do_div(bdi_dirty, denominator);

1) divide error: do_div() only uses the lower 32 bit of the denominator,
which may trimmed to be 0 when PROP_MAX_SHIFT > 32.

2) overflow: (bdi_dirty * numerator) could easily overflow if numerator
used up to 48 bits, leaving only 16 bits to bdi_dirty

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Tested-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Wu Fengguang [Sun, 5 Feb 2012 02:54:03 +0000 (20:54 -0600)]

writeback: fix dereferencing NULL bdi->dev on trace_writeback_queue

commit 977b7e3a52a7421ad33a393a38ece59f3d41c2fa upstream.

When a SD card is hot removed without umount, del_gendisk() will call
bdi_unregister() without destroying/freeing it. This leaves the bdi in
the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state.

When sync(2) gets the bdi before bdi_unregister() and calls
bdi_queue_work() after the unregister, trace_writeback_queue will be
dereferencing the NULL bdi->dev. Fix it with a simple test for NULL.

LKML-reference: http://lkml.org/lkml/2012/1/18/346
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

David S. Miller [Mon, 6 Feb 2012 20:14:37 +0000 (15:14 -0500)]

net: Make qdisc_skb_cb upper size bound explicit.

[ Upstream commit 16bda13d90c8d5da243e2cfa1677e62ecce26860 ]

Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.

This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Maxim Uvarov [Wed, 4 Apr 2012 21:38:26 +0000 (14:38 -0700)]

ipv4: Save nexthop address of LSRR/SSRR option to IPCB.

[ Upstream commit ac8a48106be49c422575ddc7531b776f8eb49610 ]

We can not update iph->daddr in ip_options_rcv_srr(), It is too early.
When some exception ocurred later (eg. in ip_forward() when goto
sr_failed) we need the ip header be identical to the original one as
ICMP need it.

Add a field 'nexthop' in struct ip_options to save nexthop of LSRR
or SSRR option.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:

net/ipv4/ip_options.c

commit | commitdiff | tree

Chris Metcalf [Mon, 26 Mar 2012 20:26:12 +0000 (16:26 -0400)]

compat: use sys_sendfile64() implementation for sendfile syscall

commit 1631fcea8399da5e80a80084b3b8c5bfd99d21e7 upstream.

<asm-generic/unistd.h> was set up to use sys_sendfile() for the 32-bit
compat API instead of sys_sendfile64(), but in fact the right thing to
do is to use sys_sendfile64() in all cases. The 32-bit sendfile64() API
in glibc uses the sendfile64 syscall, so it has to be capable of doing
full 64-bit operations. But the sys_sendfile() kernel implementation
has a MAX_NON_LFS test in it which explicitly limits the offset to 2^32.
So, we need to use the sys_sendfile64() implementation in the kernel
for this case.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit | commitdiff | tree

Dave Kleikamp [Tue, 3 Apr 2012 03:53:49 +0000 (22:53 -0500)]

ext4: implement ext4_file_write_iter

It was a mistake to have ext4 use generic_file_write_iter(). Its
write_iter function needs to act like ext4_file_write()

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 3 Apr 2012 00:30:53 +0000 (17:30 -0700)]

SPEC: v2.6.39-200.0.13

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 3 Apr 2012 00:16:37 +0000 (17:16 -0700)]

fix git merge: vlan: allow nested vlan_do_receive()

Git commit was not cleanly cherry-picked but this changes
are under ifdef so build did not fail on that. Fix it now.
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 2 Apr 2012 22:33:36 +0000 (15:33 -0700)]

SPEC: update and turn on kabi

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 2 Apr 2012 17:55:35 +0000 (10:55 -0700)]

SPEC: v2.6.39-200.0.12

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 27 Mar 2012 23:44:21 +0000 (16:44 -0700)]

remove unused mutex hpidebuglock

Remove unused mutex and clean up following warnings
drivers/net/hxge/hpi/hpi.c:35: warning: data definition has no type or storage class
drivers/net/hxge/hpi/hpi.c:35: warning: type defaults to "int" in declaration of "DECLARE_MUTEX"
drivers/net/hxge/hpi/hpi.c:35: warning: parameter names (without types) in function declaration

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 27 Mar 2012 22:07:54 +0000 (15:07 -0700)]

add hxge-1.3.3 driver

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Sat, 31 Mar 2012 00:19:38 +0000 (17:19 -0700)]

SPEC: v2.6.39-200.0.11

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Fri, 30 Mar 2012 23:52:48 +0000 (16:52 -0700)]

vlan: allow nested vlan_do_receive()

commit 2425717b27eb (net: allow vlan traffic to be received under bond)
broke ARP processing on vlan on top of bonding.

       +-------+
eth0 --| bond0 |---bond0.103
eth1 --|       |
       +-------+

52870.115435: skb_gro_reset_offset <-napi_gro_receive
52870.115435: dev_gro_receive <-napi_gro_receive
52870.115435: napi_skb_finish <-napi_gro_receive
52870.115435: netif_receive_skb <-napi_skb_finish
52870.115435: get_rps_cpu <-netif_receive_skb
52870.115435: __netif_receive_skb <-netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: bond_handle_frame <-__netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: arp_rcv <-__netif_receive_skb
52870.115436: kfree_skb <-arp_rcv

Packet is dropped in arp_rcv() because its pkt_type was set to
PACKET_OTHERHOST in the first vlan_do_receive() call, since no eth0.103
exists.

We really need to change pkt_type only if no more rx_handler is about to
be called for the packet.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:

include/linux/if_vlan.h

commit | commitdiff | tree

John Fastabend [Mon, 10 Oct 2011 09:16:41 +0000 (09:16 +0000)]

net: allow vlan traffic to be received under bond

The following configuration used to work as I expected. At least
we could use the fcoe interfaces to do MPIO and the bond0 iface
to do load balancing or failover.

       ---eth2.228-fcoe
       |
eth2 -----|
          |
          |---- bond0
          |
eth3 -----|
       |
       ---eth3.228-fcoe

This worked because of a change we added to allow inactive slaves
to rx 'exact' matches. This functionality was kept intact with the
rx_handler mechanism. However now the vlan interface attached to the
active slave never receives traffic because the bonding rx_handler
updates the skb->dev and goto's another_round. Previously, the
vlan_do_receive() logic was called before the bonding rx_handler.

Now by the time vlan_do_receive calls vlan_find_dev() the
skb->dev is set to bond0 and it is clear no vlan is attached
to this iface. The vlan lookup fails.

This patch moves the VLAN check above the rx_handler. A VLAN
tagged frame is now routed to the eth2.228-fcoe iface in the
above schematic. Untagged frames continue to the bond0 as
normal. This case also remains intact,

eth2 --> bond0 --> vlan.228

Here the skb is VLAN tagged but the vlan lookup fails on eth2
causing the bonding rx_handler to be called. On the second
pass the vlan lookup is on the bond0 iface and completes as
expected.

Putting a VLAN.228 on both the bond0 and eth2 device will
result in eth2.228 receiving the skb. I don't think this is
completely unexpected and was the result prior to the rx_handler
result.

Note, the same setup is also used for other storage traffic that
MPIO is used with eg. iSCSI and similar setups can be contrived
without storage protocols.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Tested-by: Hans Schillstrom <hams.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Mon, 22 Aug 2011 19:43:22 +0000 (12:43 -0700)]

net: vlan: goto another_round instead of calling __netif_receive_skb

Now, when vlan tag on untagged in non-accelerated path is stripped from
skb, headers are reset right away. Benefit from that and avoid calling
__netif_receive_skb recursivelly and just use another_round.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Maxim Uvarov [Thu, 29 Mar 2012 21:43:24 +0000 (14:43 -0700)]

SPEC: v2.6.39-200.0.10

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Wed, 22 Jun 2011 23:56:14 +0000 (16:56 -0700)]

ocfs2/cluster: Fix output in file elapsed_time_in_ms

The o2hb debugfs file, elapsed_time_in_ms, should return values only after the
timer is armed for the time time.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 29 Mar 2012 19:52:38 +0000 (12:52 -0700)]

SPEC: v2.6.39-200.0.9

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 29 Mar 2012 19:11:24 +0000 (12:11 -0700)]

Revert "loop: increase default number of loop devices to 512"

This reverts commit 216aace2f31f28c54b612b032e76ee827396fb55.
Bug was fixed in agent side, reverting this patch.

commit | commitdiff | tree

Maxim Uvarov [Thu, 29 Mar 2012 19:10:06 +0000 (12:10 -0700)]

Revert "loop: set default number of loop devices to 200"

This reverts commit 10fa61b8a250525b8ca79e0f5edde054a97caf44.
Bug was fixed in agent side, reverting this patch.

commit | commitdiff | tree

Sunil Mushran [Tue, 21 Jun 2011 20:22:42 +0000 (13:22 -0700)]

ocfs2/dlm: dlmlock_remote() needs to account for remastery

In dlmlock_remote(), we wait for the resource to stop being active before
setting the inprogress flag. Active includes recovery, migration, etc.

The problem here is that if the resource was being recovered or migrated, the
new owner could very well be that node itself (and thus not a remote node).
This problem was observed in Oracle bug#12583620. The error messages observed
were as follows:

dlm_send_remote_lock_request:337 ERROR: Error -40 (ELOOP) when sending message 503 (key 0xd6d8c7) to node 2
dlmlock_remote:271 ERROR: dlm status = DLM_BADARGS
dlmlock:751 ERROR: dlm status = DLM_BADARGS

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 29 Mar 2012 18:31:22 +0000 (11:31 -0700)]

ocfs2/dlm: Take inflight reference count for remotely mastered resources too

The inflight reference count, in the lock resource, is taken to pin the resource
in memory. We take it when a new resource is created and release it after a
lock is attached to it. We do this to prevent the resource from getting purged
prematurely.

Earlier this reference count was being taken for locally mastered resources
only. This patch extends the same functionality for remotely mastered ones.

We are doing this because the same premature purging could occur for remotely
mastered resources if the remote node were to die before completion of the
create lock.

Fix for Oracle bug#12405575.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Conflicts:

fs/ocfs2/dlm/dlmmaster.c

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 29 Mar 2012 18:28:31 +0000 (11:28 -0700)]

ocfs2/dlm: Clean up refmap helpers

Patch cleans up helpers that set/clear refmap bits and grab/drop inflight lock
ref counts.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Conflicts:

fs/ocfs2/dlm/dlmmaster.c

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Tue, 21 Jun 2011 20:22:40 +0000 (13:22 -0700)]

ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()

dlm_wait_for_node_death() and dlm_wait_for_node_recovery() needed a facelift.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Tue, 21 Jun 2011 20:22:37 +0000 (13:22 -0700)]

ocfs2/dlm: Cleanup up dlm_finish_local_lockres_recovery()

dlm_finish_local_lockres_recovery() needed a facelift.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Tue, 21 Jun 2011 20:22:39 +0000 (13:22 -0700)]

ocfs2/dlm: Trace insert/remove of resource to/from hash

Add mlog to trace adding and removing the resource from/to the hash table.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Tue, 21 Jun 2011 20:22:35 +0000 (13:22 -0700)]

ocfs2/dlm: Clean up messages in o2dlm

o2dlm messages needed a facelift.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Wed, 22 Jun 2011 23:56:27 +0000 (16:56 -0700)]

ocfs2/cluster: Cluster up now includes network connections too

The cluster up check only checks to see if the node is heartbeating or not.
If yes it continues assuming that the node is connected to all the nodes. But
if that is not the case, the cluster join aborts with a stack of errors that
are not easy to comprehend.

This patch adds the network connect check upfront and prints the nodes that
the node is not yet connected to, before aborting.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Tue, 21 Jun 2011 20:21:28 +0000 (13:21 -0700)]

ocfs2/cluster: Clean up messages in o2net

o2net messages needed a facelift.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Mon, 20 Jun 2011 17:07:08 +0000 (10:07 -0700)]

ocfs2/cluster: Abort heartbeat start on hard-ro devices

Currently if the heartbeat device is hard-ro, the o2hb thread keeps chugging
along and dumping errors along the way. The user needs to manually stop the
heartbeat.

The patch addresses this shortcoming by adding a limit to the number of times
the hb thread will iterate in an unsteady state. If the hb thread does not
ready steady state in that many interation, the start is aborted.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 27 Mar 2012 22:26:25 +0000 (15:26 -0700)]

SPEC: v2.6.39-200.0.8

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Tue, 27 Mar 2012 22:22:13 +0000 (15:22 -0700)]

loop: set default number of loop devices to 200

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Mon, 26 Mar 2012 23:05:33 +0000 (16:05 -0700)]

SPEC OL5: fix xen support

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Fri, 23 Mar 2012 01:59:19 +0000 (18:59 -0700)]

SPEC: v2.6.39-200.0.6

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Fri, 23 Mar 2012 01:44:51 +0000 (18:44 -0700)]

ocfs2: Rollback commit ea455f8ab68338ba69f5d3362b342c115bea8e13

This patch is part 3 of 3 patches that rolls back changes that appear
to have broken delete.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Conflicts:

fs/ocfs2/dcache.c

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Fri, 23 Mar 2012 01:43:54 +0000 (18:43 -0700)]

ocfs2: Rollback commit f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a

This patch is part 2 of 3 patches that rolls back changes that appear
to have broken delete.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Conflicts:

fs/ocfs2/super.c

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Tue, 29 Nov 2011 20:55:54 +0000 (12:55 -0800)]

ocfs2: Rollback commit 5fd131893793567c361ae64cbeb28a2a753bbe35

This patch is part 1 of 3 patches that rolls back changes that appear
to have broken delete. Those patches were originally added to fix a deadlock
with quotas enabled. Considering we are not enabling quotas in OVM3, it is
ok to rollback these patches. We will have to solve the deadlock in another
way.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Wed, 23 Nov 2011 18:50:23 +0000 (10:50 -0800)]

ocfs2/cluster: Fix o2net_fill_node_map()

Fix an oops having the following stack trace....
o2net_fill_node_map() => o2net_tx_can_proceed() => sc_put() => kref_put() => sc_kref_release()

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Wed, 22 Jun 2011 23:56:20 +0000 (16:56 -0700)]

ocfs2/cluster: Add new function o2net_fill_node_map()

Patch adds function o2net_fill_node_map() to return the bitmap of nodes that
it is connected to. This bitmap is also accessible by the user via the debugfs
file, /sys/kernel/debug/o2net/connected_nodes.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Tue, 8 Nov 2011 21:00:19 +0000 (13:00 -0800)]

ocfs2: Tighten free bit calculation in the global bitmap

When clearing bits in the global bitmap, we do not test the current bit value.
This patch tightens the code by considering the possiblity that the bit being
cleared was already cleared.

Now this should not happen. But we are seeing stray instances in which free
bit count in the global bitmap exceeds the total bit count. In each instance
the bitmap is correct. Only the free bit count is incorrect.

This patch checks the current bit value and increments the free bit count
only if the bit was previously set. It also prints information to allow
us to debug further.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Wed, 7 Sep 2011 18:39:30 +0000 (11:39 -0700)]

ocfs2/trivial: Limit unaligned aio+dio write messages to once per day

It was printing more frequently.

Signed-off-cy: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Fri, 23 Mar 2012 01:38:26 +0000 (18:38 -0700)]

Merge branch 'loop' of git://ca-git.us.oracle.com/linux-dkleikam-public into uek2-200

commit | commitdiff | tree

Dave Kleikamp [Thu, 22 Mar 2012 22:19:44 +0000 (17:19 -0500)]

btrfs: btrfs_direct_IO_bvec() needs to check for sector alignment

I left out a vital piece of btrfs_direct_IO_bvec(). I didn't think
check_direct_IO() was needed, but it is. This patch creates its
equivalent, check_direct_IO_bvec().

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 22 Mar 2012 17:36:09 +0000 (10:36 -0700)]

Merge branch 'loop' of git://ca-git.us.oracle.com/linux-dkleikam-public into buek2-200

commit | commitdiff | tree

Maxim Uvarov [Thu, 22 Mar 2012 17:35:15 +0000 (10:35 -0700)]

Merge branch 'uek2-merge' of git://ca-git.us.oracle.com/linux-konrad-public into buek2-200

commit | commitdiff | tree

Dave Kleikamp [Thu, 22 Mar 2012 16:58:15 +0000 (11:58 -0500)]

loop: increase default number of loop devices to 512

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Thu, 22 Mar 2012 14:23:34 +0000 (10:23 -0400)]

xen/merge error: Re-introduce xen-platform-pci driver.

The merge 98687483f3918349f96697261b491ee41b824871:

    Merge branch 'stable/not-upstreamed' into uek2-merge

    * stable/not-upstreamed:
      Xen: Export host physical CPU information to dom0
      xen/mce: Change the machine check point
      Add mcelog support from xen platform

    Conflicts:
     drivers/xen/Kconfig
     drivers/xen/Makefile

mismerged the Makefile. The end result that the line
obj-$(CONFIG_XEN_PVHVM) += platform_pci.o was removed
leading to no PVonHVM driver to tell QEMU to unplug the device.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Bjorn Helgaas [Sat, 2 Jul 2011 16:47:12 +0000 (10:47 -0600)]

x86/PCI: reduce severity of host bridge window conflict warnings

Host bridge windows are top-level resources, so if we find a host bridge
window conflict, it's probably with a hard-coded legacy reservation.
Moving host bridge windows is theoretically possible, but we don't support
it; we just ignore windows with conflicts, and it's not worth making this
a user-visible error.

Reported-and-tested-by: Jools Wills <jools@oxfordinspire.co.uk>
References: https://bugzilla.kernel.org/show_bug.cgi?id=38522
Reported-by: Das <dasfox@gmail.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=16497
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fixes Oracle Bug 13818972
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 22 Mar 2012 00:34:38 +0000 (17:34 -0700)]

SPEC: v2.6.39-200.0.5

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Maxim Uvarov [Thu, 22 Mar 2012 00:31:57 +0000 (17:31 -0700)]

Merge branch 'loop' of git://ca-git.us.oracle.com/linux-dkleikam-public into buek2-200

commit | commitdiff | tree

Maxim Uvarov [Thu, 22 Mar 2012 00:23:10 +0000 (17:23 -0700)]

SPEC: v2.6.39-200.0.4

Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>

commit | commitdiff | tree

Sunil Mushran [Mon, 15 Aug 2011 20:49:11 +0000 (13:49 -0700)]

ocfs2/trivial: Print message indicating unaligned aio+dio write

Print a message indicating unaligned aio+dio writes. It prints a message
once per 24 hrs.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>

commit | commitdiff | tree

Jan Kara [Mon, 15 Aug 2011 17:09:33 +0000 (10:09 -0700)]

ocfs2: Avoid livelock in ocfs2_readpage()

[Pulled before push to mainline]

When someone writes to an inode, readers accessing the same inode via
ocfs2_readpage() just busyloop trying to get ip_alloc_sem because
do_generic_file_read() looks up the page again and retries ->readpage()
when previous attempt failed with AOP_TRUNCATED_PAGE. When there are enough
readers, they can occupy all CPUs and in non-preempt kernel the system is
deadlocked because writer holding ip_alloc_sem is never run to release the
semaphore. Fix the problem by making reader block on ip_alloc_sem to break
the busy loop.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>

commit | commitdiff | tree

Mark Fasheh [Mon, 15 Aug 2011 17:08:44 +0000 (10:08 -0700)]

ocfs2: serialize unaligned aio

[Pulled before push to mainline]

Fix a corruption that can happen when we have (two or more) outstanding
aio's to an overlapping unaligned region.  Ext4
(e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix
similar issues.

In our case what happens is that we can have an outstanding aio on a region
and if a write comes in with some bytes overlapping the original aio we may
decide to read that region into a page before continuing (typically because
of buffered-io fallback).  Since we have no ordering guarantees with the
aio, we can read stale or bad data into the page and then write it back out.

If the i/o is page and block aligned, then we avoid this issue as there
won't be any need to read data from disk.

I took the same approach as Eric in the ext4 patch and introduced some
serialization of unaligned async direct i/o.  I don't expect this to have an
effect on the most common cases of AIO.  Unaligned aio will be slower
though, but that's far more acceptable than data corruption.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>

commit | commitdiff | tree

Dan Carpenter [Mon, 15 Aug 2011 16:55:36 +0000 (09:55 -0700)]

ocfs2: null deref on allocation error

[Pulled before push to mainline]

The original code had a null derefence in the error handling.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>

commit | commitdiff | tree

Tiger Yang [Mon, 15 Aug 2011 16:54:55 +0000 (09:54 -0700)]

ocfs2: Bugfix for hard readonly mount

[Pulled before push to mainline]

ocfs2 cannot currently mount a device that is readonly at the media
("hard readonly"). Fix the broken places.
see detail: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1322

[ Description edited -- Joel ]

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Reviewed-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>

commit | commitdiff | tree

Dave Kleikamp [Wed, 21 Mar 2012 20:06:42 +0000 (15:06 -0500)]

btrfs: create btrfs_file_write_iter()

Since btrfs has it's own aio_write function, it needs a similar
write_iter. This patch patterns btrfs_file_write_iter on
btrfs_file_aio_write, then has the latter call the former.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>

commit | commitdiff | tree

Konrad Rzeszutek Wilk [Wed, 21 Mar 2012 15:43:32 +0000 (11:43 -0400)]

xen/acpi: Remove the WARN's as they just create noise.

When booting the kernel under machines that do not have P-states
we would end up with:

------------[ cut here ]------------
WARNING: at drivers/xen/xen-acpi-processor.c:504
xen_acpi_processor_init+0x286/0
x2e0()
Hardware name: ProLiant BL460c G6
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.39-200.0.3.el5uek #1
Call Trace:
  [<ffffffff8191d056>] ? xen_acpi_processor_init+0x286/0x2e0
  [<ffffffff81068300>] warn_slowpath_common+0x90/0xc0
  [<ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0
  [<ffffffff8106834a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8191d056>] xen_acpi_processor_init+0x286/0x2e0
  [<ffffffff8191cdd0>] ? check_acpi_ids+0x1e0/0x1e0
  [<ffffffff81002168>] do_one_initcall+0xe8/0x130

.. snip..

Which is OK - the machines do not have P-states, so we fail to register
to process the _PXX states. But there is no need to WARN the user
of it.

Oracle BZ# 13871288
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

commit | commitdiff | tree

Guru Anbalagane [Tue, 20 Mar 2012 00:52:16 +0000 (17:52 -0700)]

SPEC: v2.6.39-200.0.3
Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com>

commit | commitdiff | tree

Guru Anbalagane [Tue, 20 Mar 2012 00:43:53 +0000 (17:43 -0700)]

Merge branch 'uek2-merge' of git://ca-git.us.oracle.com/linux-konrad-public into uek2-200

Unnamed repository; edit this file 'description' to name the repository.