Joe Jin [Mon, 4 Jun 2012 05:45:02 +0000 (13:45 +0800)]
dm-nfs: force random mode for the backend file
Orabug: 14092678
Without this flag page_cache_sync_readahead() might take some seconds to
complete.
Since dm-nfs used for ovm and as vdisk, random access is expect, so force
set this flag when open the backend file.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Adnan Misherfi <adnan.misherfi@oracle.com> Cc: Kurt C Hackel <kurt.hackel@oracle.com> Cc: Andrew Thomas <andrew.thomas@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andy Adamson [Wed, 7 Dec 2011 16:55:27 +0000 (11:55 -0500)]
NFSv4: include bitmap in nfsv4 get acl data
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.
This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.
Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
This fixes: CVE-2011-4131
Cc: stable@kernel.org Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andrea Arcangeli [Thu, 7 Jun 2012 19:45:30 +0000 (21:45 +0200)]
thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding
the mmap_sem for reading, cmpxchg8b cannot be used to read pmd
contents under Xen.
So instead of dealing only with "consistent" pmdvals in
pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
where the low 32bit and high 32bit could be inconsistent (to avoid
having to use cmpxchg8b).
The only guarantee we get from pmd_read_atomic is that if the low part
of the pmd was found null, the high part will be null too (so the pmd
will be considered unstable). And if the low part of the pmd is found
"stable" later, then it means the whole pmd was read atomically
(because after a pmd is stable, neither MADV_DONTNEED nor page faults
can alter it anymore, and we read the high part after the low part).
In the 32bit PAE x86 case, it is enough to read the low part of the
pmdval atomically to declare the pmd as "stable" and that's true for
THP and no THP, furthermore in the THP case we also have a barrier()
that will prevent any inconsistent pmdvals to be cached by a later
re-read of the *pmd.
(cherry picked from commit cdc7a76d4903387391fba3284be3b0b5c364f3d2)
Orabug: 14217003 Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
The aio-dio changes for the loop device driver broke ocfs2 and btrfs's
handling of rlimit. generic_write_checks() adjusts the IO byte count to
account for the rlimit, but the updated count was not being reflected in
the iov_iter data structure.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Liu, Jinsong [Fri, 15 Jun 2012 01:03:39 +0000 (09:03 +0800)]
xen/mce: add .poll method for mcelog device driver
If a driver leaves its poll method NULL, the device is assumed to
be both readable and writable without blocking.
This patch add .poll method to xen mcelog device driver, so that
when mcelog use system calls like ppoll or select, it would be
blocked when no data available, and avoid spinning at CPU.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Avi Kivity [Sun, 22 Apr 2012 14:02:11 +0000 (17:02 +0300)]
KVM: Fix buffer overflow in kvm_set_irq()
Bugdb: 13966
kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.
Fix by ensuring that an MSI entry is added to an empty list.
This fixes: CVE-2012-2137 Signed-off-by: Avi Kivity <avi@redhat.com>
Jason Wang [Wed, 30 May 2012 21:18:10 +0000 (21:18 +0000)]
net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
Bugdb: 13966
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
This fixes: CVE-2012-2136 Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Arcangeli [Tue, 29 May 2012 22:06:49 +0000 (15:06 -0700)]
mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition
Bugdb: 13966
When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.
This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.
With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.
This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.
Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.
NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.
This bug was discovered and fully debugged by Ulrich, quote:
----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.
496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
*pmd)
497 {
498 /* depend on compiler for an atomic pmd read */
499 pmd_t pmdval = *pmd;
Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.
- The PMD entry {high|low} is 0x0000000000000000.
The "mov" at 0xc0507a84 loads 0x00000000 into edx.
- A page fault (on another CPU) sneaks in between the two "mov"
instructions and instantiates the PMD.
- The PMD entry {high|low} is now 0x00000003fda38067.
The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----
This fixes: CVE-2012-2373
Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Williamson [Wed, 18 Apr 2012 03:46:44 +0000 (21:46 -0600)]
KVM: lock slots_lock around device assignment
Bugdb: 13966
As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu. This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots. Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings. Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This fixes: CVE-2012-2121
Conflicts:
Alex Williamson [Wed, 11 Apr 2012 15:51:49 +0000 (09:51 -0600)]
KVM: unmap pages from the iommu when slots are removed
Bugdb: 13966
We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.
Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
This fixes: CVE-2012-2121
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Eric Paris [Tue, 17 Apr 2012 20:26:54 +0000 (16:26 -0400)]
fcaps: clear the same personality flags as suid when fcaps are used
Bugdb: 13966
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.
This fixes: CVE-2012-2123
Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
x86, MCE, AMD: Make APIC LVT thresholding interrupt optional
Currently, the APIC LVT interrupt for error thresholding is implicitly
enabled. However, there are models in the F15h range which do not enable
it. Make the code machinery which sets up the APIC interrupt support
an optional setting and add an ->interrupt_capable member to the bank
representation mirroring that capability and enable the interrupt offset
programming only if it is true.
Simplify code and fixup comment style while at it.
Andre Przywara [Fri, 23 Mar 2012 09:02:17 +0000 (10:02 +0100)]
hwmon: (fam15h_power) Increase output resolution
On high CPU load the accumulating values in the running_avg_cap
register are very low (below 10), so averaging them too early leads
to unnecessary poor output resolution. Since we pretend to output
micro-Watt we better keep all the bits we have as long as possible.
Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
Conflicts:
MCA details seldom change inbetween the models of a family so don't
be too conservative and enable decoding on everything starting from
K8 onwards. Minor adjustments can come in later but most importantly,
we have some decoding infrastructure in place for upcoming models by
default.
Ashish Shenoy [Thu, 23 Feb 2012 01:20:38 +0000 (17:20 -0800)]
amd64_edac: Fix missing csrows sysfs nodes
While initializing the array of csrow attribute instances, a few csrows
were uninitialized. This happened because the module only performed a
check for DRAM base ctl register0's and not DRAM base ctl register1's
chip select enable bit. There could be systems with DIMMs populated
on only single memory channel whereas the module also assumed that a
dual channel dimm had double the memory size of a single memory channel
instead of checking the memory on each channel.
Dan Carpenter [Thu, 6 Oct 2011 06:30:25 +0000 (02:30 -0400)]
amd64_edac: Cleanup return type of amd64_determine_edac_cap()
Sparse complains that edac_cap was declared as dev_type and we are
returning edac_type. Historically, edac_type was correct but since
then we have changed it to return a bit field.
When accessing the scrub rate control register (F3x58) on F15h, the DRAM
controller selector (F1x10C[DctCfgSel]) has to point to DCT0 so that the
scrub rate configuration can take effect. See Erratum 505 in the AMD
F15h revision guide for more details.
Borislav Petkov [Wed, 24 Aug 2011 15:47:11 +0000 (17:47 +0200)]
EDAC, MCE, AMD: Drop local coreid reporting
MCE decoding code is reporting the core which encountered the error
unconditionally now so drop this piece. Besides, it reported the
coreid in the local processor package which is not that valuable as a
datapoint.
EDAC, MCE, AMD: Print valid addr when reporting an error
The MCi_STATUS bank has a AddrV bit which, when set, denotes that the
corresponding MCi_ADDR MSR contains a valid address belonging to the
MCE currently being reported. Dump it since it is definitely relevant
information.
Borislav Petkov [Thu, 4 Aug 2011 17:25:24 +0000 (19:25 +0200)]
EDAC, MCE, AMD: Print CPU number when reporting the error
Currently, correctable ECCs go through mcelog and do not print the scary
MCE banner. In that case, however, reporting the core where the CECC
happened is important information so dump it along with the decoded
string albeit at risk of having a minor redundancy.
x86, MCE, AMD: Disable error thresholding bank 4 on some models
Turn off MC4_MISC thresholding banks on models which have them but that
particular processor implementation does not supply applicable error
sources to be counted.
Depending on whether the box supports the APIC LVT interrupt for
thresholding, we want to show the 'interrupt_enable' sysfs node or not.
Make that the case by adding it to the default sysfs attributes only if
it is supported.
Andre Przywara [Fri, 30 Mar 2012 20:48:20 +0000 (16:48 -0400)]
hwmon: (k10temp) Add support for AMD Trinity CPUs
The on-chip northbridge's temperature sensor of the upcoming
AMD Trinity CPUs works the same as for the previous CPUs.
Since it has a different PCI-ID, we just add the new one to the list
supported by k10temp.
This allows to use the k10temp driver on those CPUs.
Andre Przywara [Mon, 9 Apr 2012 22:16:34 +0000 (18:16 -0400)]
hwmon: fam15h_power: fix bogus values with current BIOSes
Newer BKDG[1] versions recommend a different initialization value for
the running average range register in the northbridge. This improves
the power reading by avoiding counter saturations resulting in bogus
values for anything below about 80% of TDP power consumption.
Updated BIOSes will have this new value set up from the beginning,
but meanwhile we correct this value ourselves.
This needs to be done on all northbridges, even on those where the
driver itself does not register at.
This fixes the driver on all current machines to provide proper
values for idle load.
Andreas Herrmann [Tue, 3 Apr 2012 10:13:07 +0000 (12:13 +0200)]
x86/amd: Re-enable CPU topology extensions in case BIOS has disabled it
BIOS will switch off the corresponding feature flag on family
15h models 10h-1fh non-desktop CPUs.
The topology extension CPUID leafs are required to detect which
cores belong to the same compute unit. (thread siblings mask is
set accordingly and also correct information about L1i and L2
cache sharing depends on this).
W/o this patch we wouldn't see which cores belong to the same
compute unit and also cache sharing information for L1i and L2
would be incorrect on such systems.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Martin K. Petersen [Fri, 15 Jun 2012 15:30:06 +0000 (11:30 -0400)]
Fix system hang due to bad protection module parameters (CR 130769)
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Simon Graham [Thu, 24 May 2012 06:26:07 +0000 (06:26 +0000)]
xen/netback: Calculate the number of SKB slots required correctly
When calculating the number of slots required for a packet header, the code
was reserving too many slots if the header crossed a page boundary. Since
netbk_gop_skb copies the header to the start of the page, the count of
slots required for the header should be based solely on the header size.
This problem is easy to reproduce if a VIF is bridged to a USB 3G modem
device as the skb->data value always starts near the end of the first page.
Signed-off-by: Simon Graham <simon.graham@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e26b203ede31fffd52571a5ba607a26c79dc5c0d)
Martin K. Petersen [Wed, 13 Jun 2012 04:05:30 +0000 (00:05 -0400)]
sd: Avoid remapping bad reference tags
It does not make sense to translate ref tags with unexpected values.
Instead we simply ignore them and let the upper layers catch the
problem. Ref tags that contain the expected value are still remapped.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Konrad Rzeszutek Wilk [Tue, 12 Jun 2012 18:33:31 +0000 (14:33 -0400)]
Merge branch 'stable/for-linus-3.6.rebased' into uek2-merge
* stable/for-linus-3.6.rebased:
xen/mce: schedule a workqueue to avoid sleep in atomic context
xen/mce: Register native mce handler as vMCE bounce back point
x86, MCE, AMD: Adjust initcall sequence for xen
xen/mce: Add mcelog support for Xen platform
Liu, Jinsong [Tue, 12 Jun 2012 15:11:16 +0000 (23:11 +0800)]
xen/mce: schedule a workqueue to avoid sleep in atomic context
copy_to_user might sleep and print a stack trace if it is executed
in an atomic spinlock context. Like this:
(XEN) CMCI: send CMCI to DOM0 through virq
BUG: sleeping function called from invalid context at /home/konradinux/kernel.h:199
in_atomic(): 1, irqs_disabled(): 0, pid: 4581, name: mcelog
Pid: 4581, comm: mcelog Tainted: G O 3.5.0-rc1upstream-00003-g149000b-dirty #1
[<ffffffff8109ad9a>] __might_sleep+0xda/0x100
[<ffffffff81329b0b>] xen_mce_chrdev_read+0xab/0x140
[<ffffffff81148945>] vfs_read+0xc5/0x190
[<ffffffff81148b0c>] sys_read+0x4c/0x90
[<ffffffff815bd039>] system_call_fastpath+0x16
This patch schedule a workqueue for IRQ handler to poll the data,
and use mutex instead of spinlock, so copy_to_user sleep in atomic
context would not occur.
[upstream git commit b3856ae3554ef4cea39126f52216182ee83e8ec8] Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Liu, Jinsong [Thu, 7 Jun 2012 11:58:50 +0000 (19:58 +0800)]
x86, MCE, AMD: Adjust initcall sequence for xen
there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device
xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;
mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.
so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);
when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.
Liu, Jinsong [Thu, 7 Jun 2012 11:56:51 +0000 (19:56 +0800)]
xen/mce: Add mcelog support for Xen platform
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.
This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.
By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.
To test follow directions outlined in Documentation/acpi/apei/einj.txt
Joe Jin [Thu, 7 Jun 2012 23:35:58 +0000 (07:35 +0800)]
e1000e: disable rxhash when try to enable jumbo frame also rxhash and rxcsum have enabled
commit ffd3d6 check if both rxhash and rxcsum enabled when enable jumbo
frames and disallowed all of them enabled at the same time.
Since jumbo frame widely be used in real world, and el5 did not supported
enable/disable rxhash, so we changed default behavior to disable rxhash
when try to enable jumbo frames also rxhash and rxcsum have enabled.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Guru Anbalagane <guru.anbalagane@oracle.com> Acked-by: Adnan Misherfi <adnan.misherfi@oracle.com>
Mel Gorman [Tue, 10 Jan 2012 23:07:14 +0000 (15:07 -0800)]
mm: reduce the amount of work done when updating min_free_kbytes
Orabug: 14073214
When min_free_kbytes is updated, some pageblocks are marked
MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early
in boot but on large machines with 1TB of memory, this has been reported
to delay boot times, probably due to the NUMA distances involved.
The bulk of the work is due to calling calling pageblock_is_reserved() an
unnecessary amount of times and accessing far more struct page metadata
than is necessary. This patch significantly reduces the amount of work
done by setup_zone_migrate_reserve() improving boot times on 1TB machines.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 938929f14cb595f43cd1a4e63e22d36cab1e4a1f)
Junxiao Bi [Thu, 31 May 2012 01:57:17 +0000 (09:57 +0800)]
ocfs2: clear unaligned io flag when dio fails
Orabug: 14063941
The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.
Junxiao Bi [Thu, 31 May 2012 01:29:01 +0000 (09:29 +0800)]
aio: make kiocb->private NUll in init_sync_kiocb()
Orabug: 14063941
Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
io flag is involved in it to serialize the unaligned aio. As
*private is not initialized in init_sync_kiocb() of do_sync_write(),
this unaligned io flag may be unexpectly set in an aligned dio.
And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
We can't initialized this flag in ocfs2_file_write_iter() since
it may be invoked several times by do_sync_write(). So we initialize
it in init_sync_kiocb(), it's also useful for other similiar use of
it in the future.
Neil Horman [Thu, 16 Feb 2012 01:48:56 +0000 (01:48 +0000)]
vmxnet3: cap copy length at size of skb to prevent dropped frames on tx
Orabug: 14159701
I was recently shown that vmxnet3 devices on transmit, will drop very small udp
frames consistently. This is due to a regression introduced by commit 39d4a96fd7d2926e46151adbd18b810aeeea8ec0. This commit attempts to introduce an
optimization to the tx path, indicating that the underlying hardware behaves
optimally when at least 54 bytes of header data are available for direct access.
This causes problems however, if the entire frame is less than 54 bytes long.
The subsequent pskb_may_pull in vmxnet3_parse_and_copy_hdr fails, causing an
error return code, which leads to vmxnet3_tq_xmit dropping the frame.
Fix it by placing a cap on the copy length. For frames longer than 54 bytes, we
do the pull as we normally would. If the frame is shorter than that, copy the
whole frame, but no more. This ensures that we still get the optimization for
qualifying frames, but don't do any damange for frames that are too short.
Also, since I'm unable to do this, it wuold be great if vmware could follow up
this patch with some additional code commentary as to why 54 bytes is an optimal
pull length for a virtual NIC driver. The comment that introduced this was
vague on that. Thanks!
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Max Matveev <mmatveev@redhat.com> CC: Max Matveev <mmatveev@redhat.com> CC: "David S. Miller" <davem@davemloft.net> CC: Shreyas Bhatewara <sbhatewara@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
Orabug:14149364
commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") didn't really
fix the mbind vma merge problem due to wrong pgoff value passing to
vma_merge(), which made vma_merge() always return NULL.
Before the patch applied, we are getting a result like:
Zhigang Wang [Mon, 7 May 2012 20:51:10 +0000 (16:51 -0400)]
xen: expose host uuid via sysfs.
When 'expose_host_uuid = 1' is specified in vm.cfg, xen will write the physical
host uuid to xenstore. This patch expose the host uuid to userspace via sysfs:
Kevin Lyons [Wed, 30 May 2012 05:56:07 +0000 (09:56 +0400)]
SPEC: upgrade preserve rhck as a boot kernel
Orabug: 14065209
OL6.0 to OL 6.1 and OL 6.1 to OL 6.1 upgrade forcefully
installs UEK which becomes a default kernel. If customer
runs RHCK , it should be left as a default boot kernel.
Konrad Rzeszutek Wilk [Tue, 29 May 2012 18:25:53 +0000 (14:25 -0400)]
Merge branch 'stable/for-linus-3.5.rebased' into uek2-merge
* stable/for-linus-3.5.rebased:
xen/blkback: Copy id field when doing BLKIF_DISCARD.
xen/balloon: Subtract from xen_released_pages the count that is populated.
Konrad Rzeszutek Wilk [Fri, 25 May 2012 20:11:09 +0000 (16:11 -0400)]
xen/blkback: Copy id field when doing BLKIF_DISCARD.
We weren't copying the id field so when we sent the response
back to the frontend (especially with a 64-bit host and 32-bit
guest), we ended up using a random value. This lead to the
frontend crashing as it would try to pass to __blk_end_request_all
a NULL 'struct request' (b/c it would use the 'id' to find the
proper 'struct request' in its shadow array) and end up crashing:
BUG: unable to handle kernel NULL pointer dereference at 000000e4
IP: [<c0646d4c>] __blk_end_request_all+0xc/0x40
.. snip..
EIP is at __blk_end_request_all+0xc/0x40
.. snip..
[<ed95db72>] blkif_interrupt+0x172/0x330 [xen_blkfront]
This fixes the bug by passing in the proper id for the response.
Konrad Rzeszutek Wilk [Tue, 29 May 2012 16:36:43 +0000 (12:36 -0400)]
xen/balloon: Subtract from xen_released_pages the count that is populated.
We did not take into account that xen_released_pages would be
used outside the initial E820 parsing code. As such we would
did not subtract from xen_released_pages the count of pages
that we had populated back (instead we just did a simple
extra_pages = released - populated).
The balloon driver uses xen_released_pages to set the initial
current_pages count. If this is wrong (too low) then when a new
(higher) target is set, the balloon driver will request too many pages
from Xen."
This fixes errors such as:
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (51 of 512)
during bootup and
free_memory : 0
where the free_memory should be 128.
Acked-by: David Vrabel <david.vrabel@citrix.com>
[v1: Per David's review made the git commit better] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
On 32-bit systems, a large args->num_cliprects from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.
This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid
allocation for execbuffer object list").
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
On 32-bit systems, a large args->buffer_count from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.
This vulnerability was introduced in commit 8408c282 ("drm/i915:
First try a normal large kmalloc for the temporary exec buffers").
Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Konrad Rzeszutek Wilk [Wed, 23 May 2012 16:56:59 +0000 (12:56 -0400)]
xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness.
We need to make sure that those parameters are setup to be correct.
As such the value of 0 is deemed invalid and we find that we
bail out. The hypervisor sets by default all of them to be zero
and when the hypercall is done does a simple:
a.value = d->arch.hvm_domain.params[a.index];
Which means that if the Xen toolstack forgot to setup the proper
HVM_PARAM_CONSOLE_EVTCHN, we would get the default value of 0
and use that.
CC: stable@kernel.org
Fixes-Oracle-Bug: 14091238 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>