John Stultz [Mon, 2 Jul 2012 00:25:12 +0000 (20:25 -0400)]
Fix clock_was_set so it is safe to call from atomic
Backport for 3.0.36
NOTE:This is a prerequisite patch that's required to
address the widely observed leap-second related futex/hrtimer
issues.
Currently clock_was_set() is unsafe to be called from atomic
context, as it calls on_each_cpu(). This causes problems when
we need to adjust the time from update_wall_time().
To fix this, introduce a work_struct so if we're in_atomic,
we can schedule work to do the necessary update after we're
out of the atomic section.
CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: John Stultz <johnstul@us.ibm.com>
Joe Jin [Mon, 4 Jun 2012 05:45:02 +0000 (13:45 +0800)]
dm-nfs: force random mode for the backend file
Orabug: 14092678
Without this flag page_cache_sync_readahead() might take some seconds to
complete.
Since dm-nfs used for ovm and as vdisk, random access is expect, so force
set this flag when open the backend file.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Adnan Misherfi <adnan.misherfi@oracle.com> Cc: Kurt C Hackel <kurt.hackel@oracle.com> Cc: Andrew Thomas <andrew.thomas@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andy Adamson [Wed, 7 Dec 2011 16:55:27 +0000 (11:55 -0500)]
NFSv4: include bitmap in nfsv4 get acl data
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.
This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.
Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
This fixes: CVE-2011-4131
Cc: stable@kernel.org Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andrea Arcangeli [Thu, 7 Jun 2012 19:45:30 +0000 (21:45 +0200)]
thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding
the mmap_sem for reading, cmpxchg8b cannot be used to read pmd
contents under Xen.
So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid
having to use cmpxchg8b).
The only guarantee we get from pmd_read_atomic is that if the low part
of the pmd was found null, the high part will be null too (so the pmd
will be considered unstable). And if the low part of the pmd is found
"stable" later, then it means the whole pmd was read atomically
(because after a pmd is stable, neither MADV_DONTNEED nor page faults
can alter it anymore, and we read the high part after the low part).
In the 32bit PAE x86 case, it is enough to read the low part of the
pmdval atomically to declare the pmd as "stable" and that's true for
THP and no THP, furthermore in the THP case we also have a barrier()
that will prevent any inconsistent pmdvals to be cached by a later
re-read of the *pmd.
(cherry picked from commit cdc7a76d4903387391fba3284be3b0b5c364f3d2)
Orabug: 14217003 Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
The aio-dio changes for the loop device driver broke ocfs2 and btrfs's
handling of rlimit. generic_write_checks() adjusts the IO byte count to
account for the rlimit, but the updated count was not being reflected in
the iov_iter data structure.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Avi Kivity [Sun, 22 Apr 2012 14:02:11 +0000 (17:02 +0300)]
KVM: Fix buffer overflow in kvm_set_irq()
Bugdb: 13966
kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.
Fix by ensuring that an MSI entry is added to an empty list.
This fixes: CVE-2012-2137 Signed-off-by: Avi Kivity <avi@redhat.com>
Jason Wang [Wed, 30 May 2012 21:18:10 +0000 (21:18 +0000)]
net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
Bugdb: 13966
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
This fixes: CVE-2012-2136 Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Arcangeli [Tue, 29 May 2012 22:06:49 +0000 (15:06 -0700)]
mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition
Bugdb: 13966
When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.
This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.
With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.
This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.
Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.
NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.
This bug was discovered and fully debugged by Ulrich, quote:
----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.
496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
*pmd)
497 {
498 /* depend on compiler for an atomic pmd read */
499 pmd_t pmdval = *pmd;
Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.
- The PMD entry {high|low} is 0x0000000000000000.
The "mov" at 0xc0507a84 loads 0x00000000 into edx.
- A page fault (on another CPU) sneaks in between the two "mov"
instructions and instantiates the PMD.
- The PMD entry {high|low} is now 0x00000003fda38067.
The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----
This fixes: CVE-2012-2373
Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Williamson [Wed, 18 Apr 2012 03:46:44 +0000 (21:46 -0600)]
KVM: lock slots_lock around device assignment
Bugdb: 13966
As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu. This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots. Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings. Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This fixes: CVE-2012-2121
Conflicts:
Alex Williamson [Wed, 11 Apr 2012 15:51:49 +0000 (09:51 -0600)]
KVM: unmap pages from the iommu when slots are removed
Bugdb: 13966
We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.
Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
This fixes: CVE-2012-2121
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Eric Paris [Tue, 17 Apr 2012 20:26:54 +0000 (16:26 -0400)]
fcaps: clear the same personality flags as suid when fcaps are used
Bugdb: 13966
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.
This fixes: CVE-2012-2123
Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
Martin K. Petersen [Fri, 15 Jun 2012 15:30:06 +0000 (11:30 -0400)]
Fix system hang due to bad protection module parameters (CR 130769)
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Simon Graham [Thu, 24 May 2012 06:26:07 +0000 (06:26 +0000)]
xen/netback: Calculate the number of SKB slots required correctly
When calculating the number of slots required for a packet header, the code
was reserving too many slots if the header crossed a page boundary. Since
netbk_gop_skb copies the header to the start of the page, the count of
slots required for the header should be based solely on the header size.
This problem is easy to reproduce if a VIF is bridged to a USB 3G modem
device as the skb->data value always starts near the end of the first page.
Signed-off-by: Simon Graham <simon.graham@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e26b203ede31fffd52571a5ba607a26c79dc5c0d)
Martin K. Petersen [Wed, 13 Jun 2012 04:05:30 +0000 (00:05 -0400)]
sd: Avoid remapping bad reference tags
It does not make sense to translate ref tags with unexpected values.
Instead we simply ignore them and let the upper layers catch the
problem. Ref tags that contain the expected value are still remapped.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Joe Jin [Thu, 7 Jun 2012 23:35:58 +0000 (07:35 +0800)]
e1000e: disable rxhash when try to enable jumbo frame also rxhash and rxcsum have enabled
commit ffd3d6 check if both rxhash and rxcsum enabled when enable jumbo
frames and disallowed all of them enabled at the same time.
Since jumbo frame widely be used in real world, and el5 did not supported
enable/disable rxhash, so we changed default behavior to disable rxhash
when try to enable jumbo frames also rxhash and rxcsum have enabled.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com> Acked-by: Adnan Misherfi <adnan.misherfi@oracle.com>
Mel Gorman [Tue, 10 Jan 2012 23:07:14 +0000 (15:07 -0800)]
mm: reduce the amount of work done when updating min_free_kbytes
Orabug: 14073214
When min_free_kbytes is updated, some pageblocks are marked
MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early
in boot but on large machines with 1TB of memory, this has been reported
to delay boot times, probably due to the NUMA distances involved.
The bulk of the work is due to calling calling pageblock_is_reserved() an
unnecessary amount of times and accessing far more struct page metadata
than is necessary. This patch significantly reduces the amount of work
done by setup_zone_migrate_reserve() improving boot times on 1TB machines.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 938929f14cb595f43cd1a4e63e22d36cab1e4a1f)
Junxiao Bi [Thu, 31 May 2012 01:57:17 +0000 (09:57 +0800)]
ocfs2: clear unaligned io flag when dio fails
Orabug: 14063941
The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.
Junxiao Bi [Thu, 31 May 2012 01:29:01 +0000 (09:29 +0800)]
aio: make kiocb->private NUll in init_sync_kiocb()
Orabug: 14063941
Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
io flag is involved in it to serialize the unaligned aio. As
*private is not initialized in init_sync_kiocb() of do_sync_write(),
this unaligned io flag may be unexpectly set in an aligned dio.
And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
We can't initialized this flag in ocfs2_file_write_iter() since
it may be invoked several times by do_sync_write(). So we initialize
it in init_sync_kiocb(), it's also useful for other similiar use of
it in the future.
Neil Horman [Thu, 16 Feb 2012 01:48:56 +0000 (01:48 +0000)]
vmxnet3: cap copy length at size of skb to prevent dropped frames on tx
Orabug: 14159701
I was recently shown that vmxnet3 devices on transmit, will drop very small udp
frames consistently. This is due to a regression introduced by commit 39d4a96fd7d2926e46151adbd18b810aeeea8ec0. This commit attempts to introduce an
optimization to the tx path, indicating that the underlying hardware behaves
optimally when at least 54 bytes of header data are available for direct access.
This causes problems however, if the entire frame is less than 54 bytes long.
The subsequent pskb_may_pull in vmxnet3_parse_and_copy_hdr fails, causing an
error return code, which leads to vmxnet3_tq_xmit dropping the frame.
Fix it by placing a cap on the copy length. For frames longer than 54 bytes, we
do the pull as we normally would. If the frame is shorter than that, copy the
whole frame, but no more. This ensures that we still get the optimization for
qualifying frames, but don't do any damange for frames that are too short.
Also, since I'm unable to do this, it wuold be great if vmware could follow up
this patch with some additional code commentary as to why 54 bytes is an optimal
pull length for a virtual NIC driver. The comment that introduced this was
vague on that. Thanks!
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Max Matveev <mmatveev@redhat.com> CC: Max Matveev <mmatveev@redhat.com> CC: "David S. Miller" <davem@davemloft.net> CC: Shreyas Bhatewara <sbhatewara@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
Orabug:14149364
commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") didn't really
fix the mbind vma merge problem due to wrong pgoff value passing to
vma_merge(), which made vma_merge() always return NULL.
Before the patch applied, we are getting a result like:
Zhigang Wang [Mon, 7 May 2012 20:51:10 +0000 (16:51 -0400)]
xen: expose host uuid via sysfs.
When 'expose_host_uuid = 1' is specified in vm.cfg, xen will write the physical
host uuid to xenstore. This patch expose the host uuid to userspace via sysfs:
Kevin Lyons [Wed, 30 May 2012 05:56:07 +0000 (09:56 +0400)]
SPEC: upgrade preserve rhck as a boot kernel
Orabug: 14065209
OL6.0 to OL 6.1 and OL 6.1 to OL 6.1 upgrade forcefully
installs UEK which becomes a default kernel. If customer
runs RHCK , it should be left as a default boot kernel.
On 32-bit systems, a large args->num_cliprects from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.
This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid
allocation for execbuffer object list").
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
On 32-bit systems, a large args->buffer_count from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.
This vulnerability was introduced in commit 8408c282 ("drm/i915:
First try a normal large kmalloc for the temporary exec buffers").
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Konrad Rzeszutek Wilk [Wed, 23 May 2012 16:56:59 +0000 (12:56 -0400)]
xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness.
We need to make sure that those parameters are setup to be correct.
As such the value of 0 is deemed invalid and we find that we
bail out. The hypervisor sets by default all of them to be zero
and when the hypercall is done does a simple:
a.value = d->arch.hvm_domain.params[a.index];
Which means that if the Xen toolstack forgot to setup the proper
HVM_PARAM_CONSOLE_EVTCHN, we would get the default value of 0
and use that.
CC: stable@kernel.org
Fixes-Oracle-Bug: 14091238 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Yaniv Rosner [Tue, 22 May 2012 22:30:59 +0000 (15:30 -0700)]
bnx2x: PFC fix
Fix a problem in which PFC frames are not honored, due to incorrect link
attributes synchronization following PMF migration, and verify PFC XON is not
stuck from previous link change.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 22 May 2012 22:21:12 +0000 (15:21 -0700)]
cnic: Fix parity error code conflict
The recently added parity error handling used an error code that was
already defined for a different error. This could lead to bnx2x
firmware assert. We need to fix this with new error codes that are
defined for parity error only.
Signed-off-by: Michael Chan <mchan@broadcom.com> Reviewed-by: Eddie Wai <eddie.wai@broadcom.com> Reviewed-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Tue, 22 May 2012 22:04:17 +0000 (15:04 -0700)]
bnx2x: Clear MDC/MDIO warning message
This patch clears a warning message of "MDC/MDIO access timeout" which may
appear when interface is loaded due to missing clock setting before resetting
the LED, and starting periodic function too early.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Tue, 22 May 2012 21:59:44 +0000 (14:59 -0700)]
bnx2x: Fix BCM578x0-SFI pre-emphasis settings
Fix 578x0-SFI pre-emphasis settings per HW recommendations to achieve better
link strength.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Tue, 22 May 2012 21:20:50 +0000 (14:20 -0700)]
bnx2x: Fix BCM57810-KR AN speed transition
BCM57810-KR link may not come up in 1G after running loopback test, so set
the relevant registers to their default values before starting KR autoneg.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 4 Jan 2012 12:12:27 +0000 (12:12 +0000)]
cnic: Re-init dev->stats_addr after chip reset
because bnx2x frees the old and allocates new memory during chip reset.
(cherry picked from commit a9e0a4f2ca5e97ae2cff0bda72b9645e047c1a3d) Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Joe Jin [Mon, 21 May 2012 01:30:57 +0000 (09:30 +0800)]
ixgbe: Don't set ip checksum if did not enable tso.
After apply commit c108d12 - "ixgbe: Modify setup of descriptor flags to
avoid conditional jumps", X540-AT2 did not worked properly, it caused by
inconsistent olinfo_status, when enable ip checksum, make sure tso be
supported.
Junchang Wang [Mon, 5 Mar 2012 17:13:05 +0000 (17:13 +0000)]
8139too: Add 64bit statistics
Switch to use ndo_get_stats64 to get 64bit statistics.
Two sync entries are used (one for Rx and one for Tx).
(cherry picked from commit 9184a22701ed257974e7950be11da4cbd3116c63) Signed-off-by: Junchang Wang <junchangwang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Eric Dumazet [Mon, 5 Mar 2012 04:50:09 +0000 (04:50 +0000)]
net: export netdev_stats_to_stats64
Some drivers use internal netdev stats member to store part of their
stats, yet advertize ndo_get_stats64() to implement some 64bit fields.
Allow them to use netdev_stats_to_stats64() helper to make the copy of
netdev stats before they compute their 64bit counters.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 77a1abf54f4b003ad6e59c535045b2ad89fedfeb)
Joe Jin [Fri, 18 May 2012 03:35:23 +0000 (11:35 +0800)]
r8169: enable transmit time stamping.
This patch has been tested on a machine with the Realtek
RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 05).
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com> Cc: Francois Romieu <romieu@fr.zoreil.com>
(backported from commit 5047fb5d1dfcc92cf2133f246c1fe7b447ec4e5f) Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Sat, 10 Mar 2012 09:42:12 +0000 (10:42 +0100)]
r8169: stop using net_device.{base_addr, irq}.
The driver does not need this leftover of the ISA drivers era.
(cherry picked from commit 92a7c4e7183bcd29e2366f1ee784ad395c291134) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 8 Mar 2012 08:54:01 +0000 (09:54 +0100)]
r8169: move the driver removal method to the end of the driver file.
(cherry picked from commit e27566ed370da09e3b812d3d76dce002915a5bdd) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Julien Ducourthial [Tue, 8 May 2012 22:00:06 +0000 (00:00 +0200)]
r8169: fix unsigned int wraparound with TSO
The r8169 may get stuck or show bad behaviour after activating TSO :
the net_device is not stopped when it has no more TX descriptors.
This problem comes from TX_BUFS_AVAIL which may reach -1 when all
transmit descriptors are in use. The patch simply tries to keep positive
values.
Tested with 8111d(onboard) on a D510MO, and with 8111e(onboard) on a
Zotac 890GXITX.
(cherry picked from commit 477206a018f902895bfcd069dd820bfe94c187b1) Signed-off-by: Julien Ducourthial <jducourt@free.fr> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Jason Wang [Wed, 11 Apr 2012 22:10:54 +0000 (22:10 +0000)]
8139cp: set intr mask after its handler is registered
We set intr mask before its handler is registered, this does not work well when
8139cp is sharing irq line with other devices. As the irq could be enabled by
the device before 8139cp's hander is registered which may lead unhandled
irq. Fix this by introducing an helper cp_irq_enable() and call it after
request_irq().
(cherry picked from commit a8c9cb106fe79c28d6b7f1397652cadd228715ff) Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
NAPI is disabled during suspend and needs to be enabled on resume. Without
this the driver locks up during resume in rtl_reset_work() trying to disable
NAPI again.
(cherry picked from commit cff4c16296754888b6fd8c886bc860a888e20257) Signed-off-by: Artem Savkov <artem.savkov@gmail.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
françois romieu [Tue, 6 Mar 2012 01:14:12 +0000 (01:14 +0000)]
r8169: runtime resume before shutdown.
With runtime PM, if the ethernet cable is disconnected, the device is
transitioned to D3 state to conserve energy. If the system is shutdown
in this state, any register accesses in rtl_shutdown are dropped on
the floor. As the device was programmed by .runtime_suspend() to wake
on link changes, it is thus brought back up as soon as the link recovers.
Resuming every suspended device through the driver core would slow things
down and it is not clear how many devices really need it now.
Original report and D0 transition patch by Sameer Nanda. Patch has been
changed to comply with advices by Rafael J. Wysocki and the PM folks.
Reported-by: Sameer Nanda <snanda@chromium.org>
(cherry picked from commit 2a15cd2ff488a9fdb55e5e34060f499853b27c77) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Hayes Wang <hayeswang@realtek.com> Cc: Alan Stern <stern@rowland.harvard.edu> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Junchang Wang [Sun, 4 Mar 2012 22:30:32 +0000 (23:30 +0100)]
r8169: add 64bit statistics.
Switch to use ndo_get_stats64 to get 64bit statistics.
Two sync entries are used (one for Rx and one for Tx).
(cherry picked from commit 8027aa245bbd125350f6a78c5a78771d143aba55) Signed-off-by: Junchang Wang <junchangwang@gmail.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
françois romieu [Fri, 2 Mar 2012 04:43:14 +0000 (04:43 +0000)]
r8169: corrupted IP fragments fix for large mtu.
Noticed with the 8168d (-vb-gr, aka RTL_GIGA_MAC_VER_26).
ConfigX registers should only be written while the Config9346 lock
is held.
(cherry picked from commit 9c5028e9da1255dd2b99762d8627b88b29f68cce) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Reported-by: Nick Bowler <nbowler@elliptictech.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Tue, 31 Jan 2012 10:20:34 +0000 (11:20 +0100)]
r8169: spinlock redux.
rtl8169_get_regs operates under RTNL and rtl task mutex whereas
rtl_set_rx_mode is either called under RTNL or rtl task mutex protection.
(cherry picked from commit 6c05d25267ebb371c4311de6904f740342e82f7c) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Tue, 31 Jan 2012 10:09:21 +0000 (11:09 +0100)]
r8169: avoid a useless work scheduling.
(cherry picked from commit 934714d088f35b81edafdce89397969baf77fb8a) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Suggested-by: Michał Mirosław <mirqus@gmail.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Tue, 31 Jan 2012 09:56:44 +0000 (10:56 +0100)]
r8169: move task enable boolean to bitfield.
Simpler, more consistent, with negligible cost in non-critical paths.
(cherry picked from commit 6c4a70c5f286077e78b294b3f3a93dc45c40db89) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Suggested-by: Michał Mirosław <mirqus@gmail.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Tue, 31 Jan 2012 09:47:34 +0000 (10:47 +0100)]
r8169: bh locking redux and task scheduling.
- atomic bit operations are globally visible
- pending status is always cleared before execution
- scheduled works are either idempotent or only required to happen once
after a series of originating events, say link events for instance
(cherry picked from commit 98ddf986fca17840e46e070354b7e2cd2169da15) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Suggested-by: Michał Mirosław <mirqus@gmail.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Mon, 30 Jan 2012 23:00:19 +0000 (00:00 +0100)]
r8169: fix early queue wake-up.
With infinite gratitude to Eric Dumazet for allowing me to identify
the error.
(cherry picked from commit ae1f23fb433ac0aaff8aeaa5a7b14348e9aa8277) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Joe Jin [Fri, 18 May 2012 03:27:31 +0000 (11:27 +0800)]
r8169: remove work from irq handler.
The irq handler was a mess.
See 7ab87ff4c770eed71e3777936299292739fcd0fe ("via-rhine: move work from
irq handler to softirq and beyond") for similar changes. One can notice:
- all non-napi tasks are explicitely scheduled trough a single work queue.
- hiding software tx queue start behind the rtl_hw_start method is mildly
natural. Move it in the caller where needed.
- as can be seen from the heavy use of bh disabling locks, the driver is
not safe for irq context messages with netconsole. It is still quite
usable for general messaging though. Tested ok with concurrent registers
dump (ethtool -d) + background traffic + "echo t > /proc/sysrq-trigger".
As a side note, the comments in f11a377b3f4e897d11f0e8d1fc688667e2f19708
("r8169: avoid losing MSI interrupts") does not seem completely clear: if
I hack the driver further to stop acking the irq link event bit, MSI
interrupts keep being delivered (RTL8168b/8111b, XID 18000000).
(backported from commit da78dbff2e05630921c551dbbc70a4b7981a8fff) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Fri, 27 Jan 2012 14:05:38 +0000 (15:05 +0100)]
r8169: missing barriers.
(cherry picked from commit 1e874e041fc7c222cbd85b20c4406070be1f687a) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 26 Jan 2012 11:59:08 +0000 (12:59 +0100)]
r8169: irq mask helpers.
(cherry picked from commit 9085cdfa2f9f04d8678465748e2cced6e3f02e26) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 26 Jan 2012 11:50:01 +0000 (12:50 +0100)]
r8169: factor out IntrMask writes.
(cherry picked from commit 3e990ff5f119c2f9b142f3e2548dc90ca9b7dfa1) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 26 Jan 2012 10:23:32 +0000 (11:23 +0100)]
r8169: stop delaying workqueue.
Though motivated by the move of the driver to a single work queue of
sequential events and removal of hard irq processing, it looks safe as
a standalone change.
(cherry picked from commit 4422bcd4907d1bbb9f63e049e3c3819132c047a1) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 26 Jan 2012 08:59:50 +0000 (09:59 +0100)]
r8169: remove rtl8169_reinit_task.
I see no good reason to keep both rtl8169_reinit_task and rtl8169_reset_task:
- rtl8169_reinit_task adds a software failure point which does relate to
any hardware state
- they handle hardware the same. Remember that rtl8169_reinit_task was
introduced in the 8169 only era to handle PCI errors way before the 8168
asked for pll and firmware ops and compare :
(cherry picked from commit 209e5ac83b4d038ffb52cabc793f75031602a031) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>
Francois Romieu [Thu, 22 Dec 2011 17:59:37 +0000 (18:59 +0100)]
r8169: remove hardcoded PCIe registers accesses.
(cherry picked from commit 4512ff9f361a2786a18cb805d1f64b8d8719f121) Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Joe Jin <joe.jin@oracle.com>