The leap second rework unearthed another issue of inconsistent data.
On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.
This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.
Add the missing update call, so all the data is consistent everywhere.
Reported-by: Andreas Schwab <schwab@linux-m68k.org> Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de> Cc: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0851978b661f25192ff763289698f3175b1bab42)
The update of the hrtimer base offsets on all cpus cannot be made
atomically from the timekeeper.lock held and interrupt disabled region
as smp function calls are not allowed there.
clock_was_set(), which enforces the update on all cpus, is called
either from preemptible process context in case of do_settimeofday()
or from the softirq context when the offset modification happened in
the timer interrupt itself due to a leap second.
In both cases there is a race window for an hrtimer interrupt between
dropping timekeeper lock, enabling interrupts and clock_was_set()
issuing the updates. Any interrupt which arrives in that window will
see the new time but operate on stale offsets.
So we need to make sure that an hrtimer interrupt always sees a
consistent state of time and offsets.
ktime_get_update_offsets() allows us to get the current monotonic time
and update the per cpu hrtimer base offsets from hrtimer_interrupt()
to capture a consistent state of monotonic time and the offsets. The
function replaces the existing ktime_get() calls in hrtimer_interrupt().
The overhead of the new function vs. ktime_get() is minimal as it just
adds two store operations.
This ensures that any changes to realtime or boottime offsets are
noticed and stored into the per-cpu hrtimer base structures, prior to
any hrtimer expiration and guarantees that timers are not expired early.
Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bb6ed34f2a6eeb40608b8ca91f3ec90ec9dca26f)
To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 22f4bbcfb131e2392c78ad67af35fdd436d4dd54)
We need to update the base offsets from this code and we need to do
that under base->lock. Move the lock held region around the
ktime_get() calls. The ktime_get() calls are going to be replaced with
a function which gets the time and the offsets atomically.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6c89f2ce05ea7e26a7580ad9eb950f2c4f10891b)
We need to update the hrtimer clock offsets from the hrtimer interrupt
context. To avoid conversions from timespec to ktime_t maintain a
ktime_t based representation of those offsets in the timekeeper. This
puts the conversion overhead into the code which updates the
underlying offsets and provides fast accessible values in the hrtimer
interrupt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 03a90b9a6f7eec70edde4eb1f88fa8a5c058d85e)
The timekeeping code misses an update of the hrtimer subsystem after a
leap second happened. Due to that timers based on CLOCK_REALTIME are
either expiring a second early or late depending on whether a leap
second has been inserted or deleted until an operation is initiated
which causes that update. Unless the update happens by some other
means this discrepancy between the timekeeping and the hrtimer data
stays forever and timers are expired either early or late.
The reported immediate workaround - $ data -s "`date`" - is causing a
call to clock_was_set() which updates the hrtimer data structures.
See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix
Add the missing clock_was_set() call to update_wall_time() in case of
a leap second event. The actual update is deferred to softirq context
as the necessary smp function call cannot be invoked from hard
interrupt context.
Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d21e4baf4523fec26e3c70cb78b013ad3b245c83)
clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().
For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.
Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.
[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
rid of all this ifdeffery ASAP ]
Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 62b787f886e2d96cc7c5428aeee05dbe32a9531b)
Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.
Adjust wall_to_monotonic when NTP inserted a leapsecond.
Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c33f2424c3941986d402c81d380d4e805870a20f)
Conflicts:
kernel/time/timekeeping.c
When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 96bab736bad82423c2b312d602689a9078481fa9)
Konrad Rzeszutek Wilk [Thu, 16 Aug 2012 20:38:55 +0000 (16:38 -0400)]
xen/p2m: When revectoring deal with holes in the P2M array.
When we free the PFNs and then subsequently populate them back
during bootup:
Freeing 20000-20200 pfn range: 512 pages freed
1-1 mapping on 20000->20200
Freeing 40000-40200 pfn range: 512 pages freed
1-1 mapping on 40000->40200
Freeing bad80-badf4 pfn range: 116 pages freed
1-1 mapping on bad80->badf4
Freeing badf6-bae7f pfn range: 137 pages freed
1-1 mapping on badf6->bae7f
Freeing bb000-100000 pfn range: 282624 pages freed
1-1 mapping on bb000->100000
Released 283999 pages of unused memory
Set 283999 page(s) to 1-1 mapping
Populating 1acb8a-1f20e9 pfn range: 283999 pages added
We end up having the P2M array (that is the one that was
grafted on the P2M tree) filled with IDENTITY_FRAME or
INVALID_P2M_ENTRY) entries. The patch titled
"xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID."
recycles said slots and replaces the P2M tree leaf's with
&mfn_list[xx] with p2m_identity or p2m_missing.
And re-uses the P2M array sections for other P2M tree leaf's.
For the above mentioned bootup excerpt, the PFNs at
0x20000->0x20200 are going to be IDENTITY based:
P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME.
We can re-use that and replace P2M[0][256] to point to p2m_identity.
The "old" page (the grafted P2M array provided by Xen) that was at
P2M[0][256] gets put somewhere else. Specifically at P2M[6][358],
b/c when we populate back:
we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with
the new MFNs.
That is all OK, except when we revector we assume that the PFN
count would be the same in the grafted P2M array and in the
newly allocated. Since that is no longer the case, as we have
holes in the P2M that point to p2m_missing or p2m_identity we
have to take that into account.
[v2: Check for overflow]
[v3: Move within the __va check]
[v4: Fix the computation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 3fc509fc0c590900568ef516a37101d88f3476f5)
Andrew Vasquez [Fri, 31 Aug 2012 04:54:20 +0000 (21:54 -0700)]
qla2xxx: Correct loop_id_map allocation-size and usage.
Bugdb: 13653
Original code incorrectly assigned LOOPID_MAP_SIZE to be the
allocation size in bytes rather than total bit size.
Additionally corrected code to check for bit-allocation failure
in qla2x00_find_new_loop_id().
Mike Christie [Sat, 2 Jun 2012 23:29:45 +0000 (00:29 +0100)]
dm mpath: delay retry of bypassed pg
Orabug: 14478983
If I/O needs retrying and only bypassed priority groups are available,
set the pg_init_delay_retry flag to wait before retrying.
If, for example, the reason for the bypass is that the controller is
getting reset or there is a firmware upgrade happening, retrying right
away would cause a flood of log messages and retries for what could be a
few seconds or even several minutes.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
John Soni Jose [Thu, 23 Aug 2012 09:07:23 +0000 (14:37 +0530)]
be2iscsi: Fix a kernel panic because of TCP RST/FIN received.
A TCP RST/FIN can be received even before the connection specific
structures are initialized.This fix checks for the conn structure
is intialized or not when RST/FIN is received.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
John Soni Jose [Thu, 23 Aug 2012 09:02:08 +0000 (14:32 +0530)]
be2iscsi: Format the MAC_ADDR with sysfs_format_mac.
The MAC_ADDR stored in driver private structure is of
unsigned char data type but strlcpy parameters is of
signed char data type. This conversion of data types
lead to change in the value.This changed value is passed
to the upper layer and junk characters were displayed
when "iscsiadm -m iface" command was run.
In case of iSCSI boot, since the the MAC_ADDR was coming
junk the boot was also not working
Patch submitted to upstream kernel
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
John Soni Jose [Thu, 23 Aug 2012 08:55:33 +0000 (14:25 +0530)]
be2iscsi: Issue MBX Cmd for login to boot target in crashdump mode
When the driver comes up in crashdump mode, it has to explicitly
issue command to FW for logging to the boot target. This fix issues
MBX Cmd to login to boot target in crashdump mode.
Patch Submitted to upstream kernel.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
John Soni Jose [Thu, 23 Aug 2012 08:51:08 +0000 (14:21 +0530)]
be2iscsi: Removing the iscsi_data_pdu setting.
The setting of iscsi_data_pdu is not required anymore,
as this was required for BE1 adapters only. The BE1 adapter
were not supported in any previous versions of the kernel.
Patch Submitted to upstream kernel.
Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
This patch should go into 3.5 fixes. The bug was added in the
patches for the 3.5 feature window.
As you can see from the patch I made a mistake. During
development I switched from passing a struct to the size of
the struct, but left the sizeof. This results in us allocating
4 bytes (sizeof(int)) but then calling pci_free_consistent
with the size of the struct.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: John Soni Jose <sony.john-n@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
Maxim Uvarov [Wed, 29 Aug 2012 14:08:14 +0000 (07:08 -0700)]
x86/nmi: Clean up register_nmi_handler() usage
Implement a cleaner and easier to maintain version for the section
warning fixes implemented in commit eeaaa96a3a21
("x86/nmi: Fix section mismatch warnings on 32-bit").
Li Zhong [Thu, 29 Mar 2012 20:11:17 +0000 (16:11 -0400)]
x86/nmi: Fix page faults by nmiaction if kmemcheck is enabled
This patch tries to fix the problem of page fault exception
caused by accessing nmiaction structure in nmi if kmemcheck
is enabled.
If kmemcheck is enabled, the memory allocated through slab are
in pages that are marked non-present, so that some checks could
be done in the page fault handling code ( e.g. whether the
memory is read before written to ).
As nmiaction is allocated in this way, so it resides in a
non-present page. Then there is a page fault while the nmi code
accessing the nmiaction structure, which would then cause a
warning by WARN_ON_ONCE(in_nmi()) in kmemcheck_fault(), called
by do_page_fault().
This significantly simplifies the code as well, as the whole
dynamic allocation dance goes away.
v2: as Peter suggested, changed the nmiaction to use static
storage.
v3: as Peter suggested, use macro to shorten the codes. Also
keep the original usage of register_nmi_handler, so users of
this call doesn't need change.
Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Fixes: https://lkml.org/lkml/2012/3/2/356 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
[ simplified the wrappers ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: thomas.mingarelli@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1333051877-15755-4-git-send-email-dzickus@redhat.com
[ tidied the patch a bit ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Maxim Uvarov [Wed, 29 Aug 2012 13:14:04 +0000 (06:14 -0700)]
[hpwdt] add include NMI
Commit:
x86/nmi: Add new NMI queues to deal with IO_CHK and SERR
Added new NMI calls. But skipped to add nmi.h. Fix
merge commit here. Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>
Parav Pandit [Wed, 29 Aug 2012 13:06:04 +0000 (06:06 -0700)]
be2net: Add functionality to support RoCE driver
- Increase MSI-X vectors by 5 for RoCE traffic.
- Add macro to check roce support on a device.
- Add device-specific doorbell and MSI-X vector fields shared with NIC
functionality.
- Provide RoCE driver registration and deregistration functions.
- Add support functions which will be invoked on adapter add/remove
and port up/down events.
- Traverse through the list of adapters to invoke callback functions.
Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Roland Dreier <roland@purestorage.com>
Luis Henriques [Wed, 11 Jul 2012 21:02:10 +0000 (14:02 -0700)]
ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space()
As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL
as the first parameter (file), it may trigger a NULL pointer dereferrence
due to a missing check.
Jan Kara [Fri, 10 Feb 2012 09:50:07 +0000 (10:50 +0100)]
ocfs2: Fix bogus error message from ocfs2_global_read_info
'status' variable in ocfs2_global_read_info() is always != 0 when leaving the
function because it happens to contain number of read bytes. Thus we always log
error message although everything is OK. Since all error cases properly call
mlog_errno() before jumping to out_err, there's no reason to call mlog_errno()
on exit at all. This is a fallout of c1e8d35e (conversion of mlog_exit()
calls).
Al Viro [Thu, 3 May 2012 14:14:29 +0000 (10:14 -0400)]
ocfs: simplify symlink handling
seeing that "fast" symlinks still get allocation + copy, we might as
well simply switch them to pagecache-based variant of ->follow_link();
just need an appropriate ->readpage() for them...
Al Viro [Thu, 12 Apr 2012 23:58:53 +0000 (19:58 -0400)]
ocfs2: kill endianness abuses in blockcheck.c
ocfs2_block_check is for little-endian contents; if we just want to
its fields converted to host-endian in a couple of functions, just
put those values into local u32 and u16...
Al Viro [Mon, 13 Feb 2012 02:00:05 +0000 (21:00 -0500)]
ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
unfortunately, nlink_t may be smaller than 32 bits and ->i_nlink
on ocfs2 can grow up to 0xffffffff; storing it in nlink_t variable
will lose upper bits on such architectures. Needs to be made u32,
until we get kernel-side nlink_t uniformly 32bit...
Akinobu Mita [Tue, 15 Nov 2011 22:56:34 +0000 (14:56 -0800)]
ocfs2: avoid unaligned access to dqc_bitmap
The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
but not 64-bit aligned. The dqc_bitmap is accessed by ocfs2_set_bit(),
ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit(). These
are wrapper macros for ext2_*_bit() which need to take an unsigned long
aligned address (though some architectures are able to handle unaligned
address correctly)
So some 64bit architectures may not be able to access the dqc_bitmap
correctly.
This avoids such unaligned access by using another wrapper functions for
ext2_*_bit(). The code is taken from fs/ext4/mballoc.c which also need to
handle unaligned bitmap access.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit 939255798a468e1a92f03546de6e87be7b491e57)
Jan Kara [Mon, 7 Nov 2011 23:20:39 +0000 (00:20 +0100)]
ocfs2: Use filemap_write_and_wait() instead of write_inode_now()
Since ocfs2 has no ->write_inode method, there's no point in calling
write_inode_now() from ocfs2_cleanup_delete_inode(). Use
filemap_write_and_wait() instead. This helps us to cleanup inode writing
interfaces...
Mark Fasheh [Wed, 16 Nov 2011 20:03:10 +0000 (12:03 -0800)]
ocfs2: honor O_(D)SYNC flag in fallocate
We need to sync the transaction which updates i_size if the file is marked
as needing sync semantics.
Signed-off-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit df295d4a4b3c98af1a2445a82aef169e7e5d96b8)
Xiaowei.Hu [Wed, 19 Oct 2011 01:34:19 +0000 (09:34 +0800)]
ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2
With indexed_dir enabled, ocfs2 maintains a list of dirblocks having
space.
The credit calculation in ocfs2_link_credits() did not correctly account
for adding an entry that exactly fills a dirblock that triggers removing
that dirblock by changing the pointer in the previous block in the list.
The credit calculation did not account for that previous block.
To expose, do:
mkfs.ocfs2 -b 512 -M local /dev/sdX
mount /dev/sdX /ocfs2
mkdir /ocfs2/linkdir
touch /ocfs2/linkdir/file1
for i in `seq 1 29` ; do link /ocfs2/linkdir/file1
/ocfs2/linkdir/linklinklinklinklinklink$i; done
rm -f /ocfs2/linkdir/linklinklinklinklinklink10
sleep 8
link /ocfs2/linkdir/file1
/ocfs2/linkdir/linklinklinklinklinklinkaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Note:
The link names have been crafted for a 512 byte blocksize. Reproducing
with a larger blocksize will require longer (or more) links. The sleep
is important. We want jbd2 to commit the transaction so that the missing
block does not piggy back on account of the previous transaction.
Signed-off-by: XiaoweiHu <xiaowei.hu at oracle.com> Reviewed-by: WengangWang <wen.gang.wang at oracle.com> Reviewed-by: Sunil.Mushran <sunil.mushran at oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit 0393afea31874947b1d149b82d17b7dccac4f210)
Wengang Wang [Wed, 12 Oct 2011 07:22:15 +0000 (15:22 +0800)]
ocfs2: Commit transactions in error cases -v2
There are three cases found that in error cases, journal transactions are not
committed nor aborted. We should take care of these case by committing the
transactions. Otherwise, there would left a journal handle which will lead to
, in same process context, the comming ocfs2_start_trans() gets wrong credits.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit b8a0ae579fb8d9b21008ac386be08b9428902455)
Wengang Wang [Tue, 12 Jul 2011 08:43:14 +0000 (16:43 +0800)]
ocfs2: make direntry invalid when deleting it
When we deleting a direntry from a directory, if it's the first in a block we
invalid it by setting inode to 0; otherwise, we merge the deleted one to the
prior and contiguous direntry. And we don't truncate directories.
There is a problem for the later case since inode is not set to 0.
This problem happens when the caller passes a file position as parameter to
ocfs2_dir_foreach_blk(). If the position happens to point to a stale(not
the first, deleted in betweens of ocfs2_dir_foreach_blk()s) direntry, we are
not able to recognize its staleness. So that we treat it as a live one wrongly.
The fix is to set inode to 0 in both cases indicating the direntry is stale.
This won't introduce additional IOs.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit 8298524803339a9a8df053ebdfebc2975ec55be9)
Julia Lawall [Sat, 9 Jul 2011 16:04:39 +0000 (18:04 +0200)]
fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free
Memory allocated using kmem_cache_zalloc should be freed using
kmem_cache_free, not kfree.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,e,e1,e2;
@@
x = kmem_cache_zalloc(e1,e2)
... when != x = e
?-kfree(x)
+kmem_cache_free(e1,x)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit fc9f899483435935c1cd7005df29681929d1c99b)
Wengang Wang [Sun, 24 Jul 2011 17:36:54 +0000 (10:36 -0700)]
ocfs2: Fix ocfs2_page_mkwrite()
This patch address two shortcomings in ocfs2_page_mkwrite():
1. Makes the function return better VM_FAULT_* errors.
2. It handles a error that is triggered when a page is dropped from the mapping
due to memory pressure. This patch locks the page to prevent that.
[Patch was cleaned up by Sunil Mushran.]
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
(cherry picked from commit 5cffff9e29866a3de98c2c25135b3199491f93b0)
Akinobu Mita [Mon, 30 May 2011 12:58:05 +0000 (21:58 +0900)]
ocfs2: use proper little-endian bitops
Using __test_and_{set,clear}_bit_le() with ignoring its return value
can be replaced with __{set,clear}_bit_le().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit 730e663bd82c1a10a85ff00728d34152a5a67ec8)
Dan Carpenter [Sun, 29 May 2011 19:56:31 +0000 (22:56 +0300)]
ocfs2: checking the wrong variable in ocfs2_move_extent()
"new_phys_cpos" is always a valid pointer here.
ocfs2_probe_alloc_group() allocates "*new_phys_cpos".
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
(cherry picked from commit 3d75be7c4771c7e4d5b5fa586a599af8473de32c)
Joe Jin [Tue, 28 Aug 2012 13:05:09 +0000 (21:05 +0800)]
e1000e: disable rxhash when try to enable jumbo frame also rxhash and rxcsum have enabled
commit ffd3d6 check if both rxhash and rxcsum enabled when enable jumbo
frames and disallowed all of them enabled at the same time.
Since jumbo frame widely be used in real world, and el5 did not supported
enable/disable rxhash, so we changed default behavior to disable rxhash
when try to enable jumbo frames also rxhash and rxcsum have enabled.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com> Acked-by: Adnan Misherfi <adnan.misherfi@oracle.com>
(cherry picked from commit 82e316efbd1c68946c8760f930b81d73e9c4425a) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Wed, 11 Jul 2012 12:31:56 +0000 (20:31 +0800)]
r8169: remove rtl_ocpdr_cond.
It is not needed for mac_ocp_{write / read}. Actually bit 31 of OCPDR
does not change and r8168_mac_ocp_read always returns ~0.
(cherry picked from commit 3a83ad12b850c3c5b89fa9008bdd0c0782f0cf68) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Tested-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Tue, 10 Jul 2012 06:47:05 +0000 (08:47 +0200)]
r8169: fix argument in rtl_hw_init_8168g.
(cherry picked from commit 5f8bcce99e83b1155954b1ae7291dc754ad9025e) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Joe Jin [Tue, 28 Aug 2012 00:28:17 +0000 (08:28 +0800)]
r8169: support RTL8168G
For RTL8111G, the settings of phy and firmware are replaced with
ocp functions. r8168g_mdio_{write / read} redirects the relative
settings to suitable ocp functions. A per-device variable is needed
to evaluate the real address of ocp functions.
rtl_writephy(tp, 0x1f, xxxx) is dedicated to keeping said variable
up-to-date.
(backported from upstream commit c558386b836ee97762e12495101c6e373f20e69d) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Twelve functions can fail silently. Now they have a chance to complain.
Macro and pasting abuse has been kept at a level where tags and
friends should not be hurt.
(cherry picked from commit ffc46952b313ff037debca1b4e3da9472ff4b441) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
r8169: ephy, eri and efuse functions signature changes.
(cherry picked from commit fdf6fc067aaeb13aba89d1b56aa39d3bf06fde43) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
(cherry picked from commit 52989f0e429a97bc9075245e2e14ece2a4ebca5c) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Further changes need more context down in the call stack.
(cherry picked from commit 24192210a57a24a45b29dc3519dc42e073ea7b0a) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Mon, 2 Jul 2012 09:23:21 +0000 (17:23 +0800)]
r8169: add RTL8106E support.
(cherry picked from commit 5598bfe5191d09cdd622aeac39badc42508b227f) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
françois romieu [Wed, 20 Jun 2012 12:09:18 +0000 (12:09 +0000)]
r8169: RxConfig hack for the 8168evl.
The 8168evl (RTL_GIGA_MAC_VER_34) based Gigabyte GA-990FXA motherboards
are very prone to NETDEV watchdog problems without this change. See
https://bugzilla.kernel.org/show_bug.cgi?id=42899 for instance.
I don't know why it *works*. It's depressingly effective though.
For the record:
- the problem may go along IOMMU (AMD-Vi) errors but it really looks
like a red herring.
- the patch sets the RX_MULTI_EN bit. If the 8168c doc is any guide,
the chipset now fetches several Rx descriptors at a time.
- long ago the driver ignored the RX_MULTI_EN bit. e542a2269f232d61270ceddd42b73a4348dee2bb changed the RxConfig
settings. Whatever the problem it's now labeled a regression.
- Realtek's own driver can identify two different 8168evl devices
(CFG_METHOD_16 and CFG_METHOD_17) where the r8169 driver only
sees one. It sucks.
(cherry picked from commit eb2dc35d99028b698cdedba4f5522bc43e576bd2) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
françois romieu [Sat, 9 Jun 2012 10:53:16 +0000 (10:53 +0000)]
r8169: avoid NAPI scheduling delay.
While reworking the r8169 driver a few months ago to perform the
smallest amount of work in the irq handler, I took care of avoiding
any irq mask register operation in the slow work dedicated user
context thread. The slow work thread scheduled an extra round of NAPI
work which would ultimately set the irq mask register as required,
thus keeping such irq mask operations in the NAPI handler.
It would eventually race with the irq handler and delay NAPI execution
for - assuming no further irq - a whole ksoftirqd period. Mildly a
problem for rare link changes or corner case PCI events.
The race was always lost after the last bh disabling lock had been
removed from the work thread and people started wondering where those
pesky "NOHZ: local_softirq_pending 08" messages came from.
Actually the irq mask register _can_ be set up directly in the slow
work thread.
(cherry picked from commit 7dbb491878a2c51d372a8890fa45a8ff80358af1) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Reported-by: Dave Jones <davej@redhat.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Devendra Naga [Thu, 31 May 2012 01:51:20 +0000 (01:51 +0000)]
r8169: call netif_napi_del at errpaths and at driver unload
when register_netdev fails, the init'ed NAPIs by netif_napi_add must be
deleted with netif_napi_del, and also when driver unloads, it should
delete the NAPI before unregistering netdevice using unregister_netdev.
(cherry picked from commit ad1be8d345416a794dea39761a374032aa471a76) Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Jason Wang [Thu, 31 May 2012 18:19:48 +0000 (18:19 +0000)]
8139cp/8139too: terminate the eeprom access with the right opmode
Currently, we terminate the eeprom access through clearing the CS by:
RTL_W8 (Cfg9346, ~EE_CS); or writeb (~EE_CS, ee_addr);
This would left the eeprom into "Config. Register Write Enable:"
state which is not expcted as the highest two bits were set to
0x11 ( expected is the "Normal" mode (0x00)). Solving this by write
0x0 instead of ~EE_CS when terminating the eeprom access.
(cherry picked from commit 0bc777bca480357941418952cf228484f5485daf) Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Jason Wang [Thu, 31 May 2012 18:19:39 +0000 (18:19 +0000)]
8139cp: set ring address before enabling receiver
Currently, we enable the receiver before setting the ring address which could
lead the card DMA into unexpected areas. Solving this by set the ring address
before enabling the receiver.
btw. I find and test this in qemu as I didn't have a 8139cp card in hand. please
review it carefully.
(cherry picked from commit b01af4579ec41f48e9b9c774e70bd6474ad210db) Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Fri, 30 Mar 2012 06:48:06 +0000 (14:48 +0800)]
r8169: support the new RTL8411 chip.
Compared with previous chipsets, it needs no special action trough the
jumbo{enable/disable} helpers to operate with jumbo frames.
(cherry picked from commit b3d7b2f2f07ff0ab87442f2d499f2860ef59bfaa) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Fri, 30 Mar 2012 06:33:03 +0000 (14:33 +0800)]
r8169: adjust some functions of 8111f
Put some settings of 8111f into one function which may be reused.
(cherry picked from commit 5f886e08901adaaaa1c79d1f964035aee6a29370) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Fri, 30 Mar 2012 06:33:02 +0000 (14:33 +0800)]
r8169: support the new RTL8402 chip.
(cherry picked from commit 7e18dca16246b2891239cfc3c6e2dfcea715d353) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Fri, 30 Mar 2012 06:33:01 +0000 (14:33 +0800)]
r8169: add device specific CSI access helpers.
New chipsets need it.
(cherry picked from commit beb1fe184f673fae83ddd9beca3fe662019ef876) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Hayes Wang [Fri, 30 Mar 2012 06:33:00 +0000 (14:33 +0800)]
r8169: modify pll power function
Adjust r810x_pll_power_down, r810x_pll_power_up, and r8168_pll_power_up.
Always power up device during rtl_open. For r810x, turn off more power
when the WOL is disabled.
(cherry picked from commit 0004299ad41885a0a1fd321715fe7396be17ce35) Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
r8169: 8168c and later require bit 0x20 to be set in Config2 for PME signaling.
The new 84xx stopped flying below the radars.
(cherry picked from commit d387b427c973974dd619a33549c070ac5d0e089f) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
(cherry picked from commit 851e60221926a53344b4227879858bef841b0477) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Joe Jin [Tue, 28 Aug 2012 00:21:47 +0000 (08:21 +0800)]
8139too: dev->{base_addr, irq} removal.
(backported from upstream commit 65712ec016788538d27c0b0452e57b751776914e) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Joe Jin [Tue, 28 Aug 2012 00:20:25 +0000 (08:20 +0800)]
8139cp: stop using net_device.{base_addr, irq}.
(backported from upstream commit a69afe3263717ba9384cf18d05722c598f6820af) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Justin P. Mattock [Sat, 24 Mar 2012 16:00:04 +0000 (09:00 -0700)]
r8169.c: fix comment typo
The below patch fixes a typo that I found while reading the code.
(cherry picked from commit a9d7e794ea66902a255be6e87f633286d04c2b39) Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 8 Mar 2012 09:09:40 +0000 (10:09 +0100)]
r8169: move rtl_cfg_info closer to its caller.
(cherry picked from commit 31fa8b1855cb1f1fd99e2f2f9b8f2c8f113e9f2e) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 8 Mar 2012 09:06:18 +0000 (10:06 +0100)]
r8169: move the netpoll handler after the irq handler.
(cherry picked from commit dc1c00ce70da5d3bb3fc97707e04f598ff72e7ba) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Joe Jin [Tue, 28 Aug 2012 00:18:28 +0000 (08:18 +0800)]
r8169: move rtl8169_open after rtl_task it depends on.
(backported from upstream commit df43ac7831a0e321b6b183b7eb48ae4577207453) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>