Eric Dumazet [Mon, 8 Feb 2010 23:00:39 +0000 (15:00 -0800)]
dst: call cond_resched() in dst_gc_task()
Kernel bugzilla #15239
On some workloads, it is quite possible to get a huge dst list to
process in dst_gc_task(), and trigger soft lockup detection.
Fix is to call cond_resched(), as we run in process context.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 8 Feb 2010 19:18:07 +0000 (11:18 -0800)]
netfilter: nf_conntrack: fix hash resizing with namespaces
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.
Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.
Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Expectation hashtable size was simply glued to a variable with no code
to rehash expectations, so it was a bug to allow writing to it.
Make "expect_hashsize" readonly.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
Eric Dumazet [Mon, 8 Feb 2010 19:16:56 +0000 (11:16 -0800)]
netfilter: nf_conntrack: per netns nf_conntrack_cachep
nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.
We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).
If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
[Patrick: added unique slab name allocation] Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Mon, 8 Feb 2010 19:16:26 +0000 (11:16 -0800)]
netfilter: nf_conntrack: fix memory corruption with multiple namespaces
As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked"
conntrack, which is located in the data section, might be accidentally
freed when a new namespace is instantiated while the untracked conntrack
is attached to a skb because the reference count it re-initialized.
The best fix would be to use a seperate untracked conntrack per
namespace since it includes a namespace pointer. Unfortunately this is
not possible without larger changes since the namespace is not easily
available everywhere we need it. For now move the untracked conntrack
initialization to the init_net setup function to make sure the reference
count is not re-initialized and handle cleanup in the init_net cleanup
function to make sure namespaces can exit properly while the untracked
conntrack is in use in other namespaces.
Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Poole [Fri, 5 Feb 2010 17:23:43 +0000 (12:23 -0500)]
Bluetooth: Keep a copy of each HID device's report descriptor
The report descriptor is read by user space (via the Service
Discovery Protocol), so it is only available during the ioctl
to connect. However, the HID probe function that needs the
descriptor might not be called until a specific module is
loaded. Keep a copy of the descriptor so it is available for
later use.
Signed-off-by: Michael Poole <mdpoole@troilus.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-and-tested-by: Ciprian Dorin Craciun <ciprian.craciun@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 3 Feb 2010 21:59:51 +0000 (21:59 +0000)]
igb: make certain to reassign legacy interrupt vectors after reset
This change corrects an issue that will cause false hangs when using either
82575 or 82580 in legacy interrupt mode. The issue is caused when there is
a slow traffic flow and an "ethtool -r" is executed while using legacy or
MSI interrupts. MSI-X is not affected by this issue due to the fact that
we were already reconfiguring the vectors after reset.
If possible it would be best to push this for net-2.6 since it is resolving
a bug but if that is not possible then net-next-2.6 will be fine.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As there was no follow up on that bug, I am submitting this
patch assuming that the other return points will not return
invalid txq's, and also that this fixes the bug (not tested).
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Blanchard [Wed, 3 Feb 2010 13:12:51 +0000 (13:12 +0000)]
ixgbe: Fix ixgbe_tx_map error path
Commit e5a43549f7a58509a91b299a51337d386697b92c (ixgbe: remove
skb_dma_map/unmap calls from driver) looks to have introduced a bug in
ixgbe_tx_map. If we get an error from a PCI DMA call, we loop backwards
through count until it becomes -1 and return that.
The caller of ixgbe_tx_map expects 0 on error, so return that instead.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Kumar Salecha [Tue, 2 Feb 2010 04:16:21 +0000 (04:16 +0000)]
netxen: protect resource cleanup by rtnl lock
o context resources can be in used, while resource cleanup is in progress,
during fw recover.
o Null pointer execption can occur in send_cmd_desc, if fw recovery
module frees tx ring without rtnl lock.
o Same applies to ethtool register dump.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Kumar Salecha [Tue, 2 Feb 2010 04:16:20 +0000 (04:16 +0000)]
netxen: fix tx timeout recovery for NX2031 chip
For NX2031, first try to scrub interrupt before requesting firmware
reset. Return statement was missing after scrubbbing interrupt.
Signed-off-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Pelly [Fri, 13 Nov 2009 22:16:32 +0000 (14:16 -0800)]
Bluetooth: Enter active mode before establishing a SCO link.
When in sniff mode with a long interval time (1.28s) it can take 4+ seconds
to establish a SCO link. Fix by requesting active mode before requesting
SCO connection. This improves SCO setup time to ~500ms.
Bluetooth headsets that use a long interval time, and exhibit the long
SCO connection time include Motorola H790, HX1 and H17. They have a
CSR 2.1 chipset.
Verified this behavior and fix with host Bluetooth chipsets: BCM4329 and
TI1271.
It fixes the construction of the first argument of try_then_request_module(),
where only valid return codes from the first argument should be returned.
What we do now is assign the result of register_jprobe() to ret, without
the side effect of the comparison.
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 1 Feb 2010 02:12:19 +0000 (02:12 +0000)]
dccp: fix bug in cache allocation
This fixes a bug introduced in commit de4ef86cfce60d2250111f34f8a084e769f23b16
("dccp: fix dccp rmmod when kernel configured to use slub", 17 Jan): the
vsnprintf used sizeof(slab_name_fmt), which became truncated to 4 bytes, since
slab_name_fmt is now a 4-byte pointer and no longer a 32-character array.
This lead to error messages such as
FATAL: Error inserting dccp: No buffer space available
>> kernel: [ 1456.341501] kmem_cache_create: duplicate cache cci
generated due to the truncation after the 3rd character.
Fixed for the moment by introducing a symbolic constant. Tested to fix the bug.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Mon, 1 Feb 2010 13:41:47 +0000 (13:41 +0000)]
sky2: fix transmit DMA map leakage
The book keeping structure for transmit always had the flags value
cleared so transmit DMA maps were never released correctly.
Based on patch by Jarek Poplawski, problem observed by Michael Breuer.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Sat, 30 Jan 2010 10:05:05 +0000 (10:05 +0000)]
netlink: fix for too early rmmod
Netlink code does module autoload if protocol userspace is asking for is
not ready. However, module can dissapear right after it was autoloaded.
Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM.
netlink_create() in such situation _will_ create userspace socket and
_will_not_ pin module. Now if module was removed and we're going to call
->netlink_rcv into nothing:
BUG: unable to handle kernel paging request at ffffffffa02f842a
^^^^^^^^^^^^^^^^
modules are loaded near these addresses here
Alexey Dobriyan [Sat, 30 Jan 2010 02:53:27 +0000 (02:53 +0000)]
af_key: fix netns ops ordering on module load/unload
1. After sock_register() returns, it's possible to create sockets,
even if module still not initialized fully (blame generic module code
for that!)
2. Consequently, pfkey_create() can be called with pfkey_net_id still not
initialized which will BUG_ON in net_generic():
kernel BUG at include/net/netns/generic.h:43!
3. During netns shutdown, netns ops should be unregistered after
key manager unregistered because key manager calls can be triggered
from xfrm_user module:
general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
pfkey_broadcast+0x111/0x210 [af_key]
pfkey_send_notify+0x16a/0x300 [af_key]
km_state_notify+0x41/0x70
xfrm_flush_sa+0x75/0x90 [xfrm_user]
4. Unregister netns ops after socket ops just in case and for symmetry.
Reported by Luca Tettamanti.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Tested-by: Luca Tettamanti <kronos.it@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Pelly [Thu, 4 Feb 2010 00:18:36 +0000 (16:18 -0800)]
Bluetooth: Do not call rfcomm_session_put() for RFCOMM UA on closed socket
When processing a RFCOMM UA frame when the socket is closed and we were
not the RFCOMM initiator would cause rfcomm_session_put() to be called
twice during rfcomm_process_rx(). This would cause a kernel panic in
rfcomm_session_close() then.
This could be easily reproduced during disconnect with devices such as
Motorola H270 that send RFCOMM UA followed quickly by L2CAP disconnect
request. This trace for this looks like:
To fix this, the rfcomm_session_put() needs to be moved out of
rfcomm_session_timeout() into rfcomm_process_sessions(). In that
context it is perfectly fine to sleep and disconnect the socket.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Tested-by: David John <davidjon@xenontk.org>
Evgeniy Polyakov [Tue, 2 Feb 2010 23:58:48 +0000 (15:58 -0800)]
connector: Delete buggy notification code.
On Tue, Feb 02, 2010 at 02:57:14PM -0800, Greg KH (gregkh@suse.de) wrote:
> > There are at least two ways to fix it: using a big cannon and a small
> > one. The former way is to disable notification registration, since it is
> > not used by anyone at all. Second way is to check whether calling
> > process is root and its destination group is -1 (kind of priveledged
> > one) before command is dispatched to workqueue.
>
> Well if no one is using it, removing it makes the most sense, right?
>
> No objection from me, care to make up a patch either way for this?
Getting it is not used, let's drop support for notifications about
(un)registered events from connector.
Another option was to check credentials on receiving, but we can always
restore it without bugs if needed, but genetlink has a wider code base
and none complained, that userspace can not get notification when some
other clients were (un)registered.
Kudos for Sebastian Krahmer <krahmer@suse.de>, who found a bug in the
code.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Tue, 2 Feb 2010 15:48:40 +0000 (07:48 -0800)]
be2net: use eq-id to calculate cev-isr reg offset
cev-isr reg offset for each function is better calculated using (any) eq-id
alloted to that function instead of using pci-func number(which
does not work in some configurations...)
Signed-off-by: Sathya Perla <sathyap@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bastien Nocera [Wed, 20 Jan 2010 12:00:42 +0000 (12:00 +0000)]
Bluetooth: Use the control channel for raw HID reports
In commit 2da31939a42f7a676a0bc5155d6a0a39ed8451f2, support
for Bluetooth hid_output_raw_report was added, but it pushes
the data to the interrupt channel instead of the contol one.
This patch makes hid_output_raw_report use the control channel
instead. Using the interrupt channel was a mistake.
Mike Frysinger [Mon, 14 Sep 2009 17:43:49 +0000 (13:43 -0400)]
Bluetooth: Redo checks in IRQ handler for shared IRQ support
Commit ac019360fe3 changed the irq handler logic to BUG_ON rather than
returning IRQ_NONE when the incoming argument is invalid. While this
works in most cases, it doesn't work when the IRQ is shared with other
devices (or when DEBUG_SHIRQ is enabled).
So revert the previous change and replace the warning message with a
comment explaining that we want this behavior.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Ben Hutchings [Fri, 29 Jan 2010 05:37:18 +0000 (21:37 -0800)]
cdc_ether: Partially revert "usbnet: Set link down initially ..."
Commit 37e8273cd30592d3a82bcb70cbb1bdc4eaeb6b71 ("usbnet: Set link down
initially for drivers that update link state") changed the initial link
state in cdc_ether and other drivers based on the understanding that the
devices they support generate link change interrupts. However, this is
optional in the CDC Ethernet protocol, and two users have reported in
<http://bugzilla.kernel.org/show_bug.cgi?id=14791> that the link state
for their devices remains down. Therefore, revert the change in
cdc_ether.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Tested-by: Avi Rozen <avi.rozen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Mon, 25 Jan 2010 23:34:15 +0000 (23:34 +0000)]
bonding: bond_open error return value
The convention for API functions in kernel is to return errno value;
bond_open would return -1 if alb setup failed. The only reason that
could happen is if kmalloc() failed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 27 Jan 2010 16:38:06 +0000 (16:38 +0000)]
ixgbe: if ixgbe_copy_dcb_cfg is going to fail learn about it early
Call ixgbe_copy_dcb_cfg() earlier in the ixgbe_dcbnl_set_all() so that
we can learn if this is going to fail as early as possible. Previously,
ixgbe_down or ixgbe_close were being called before this check and the
IXGBE_RESETTING bit was being set and cleared. Worse if this failed
the corresponding ixgbe_up/ndo_open would not called.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 27 Jan 2010 16:37:44 +0000 (16:37 +0000)]
ixgbe: set the correct DCB bit for pg tx settings
Set the correct bit BIT_PG_TX when tx PG settings are set.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 27 Jan 2010 15:30:39 +0000 (15:30 +0000)]
igbvf: fix issue w/ mapped_as_page being left set after unmap
This change fixes an issue in igbvf with mapped_as_page being left set
after a page is unmapped which results in buffers which are mapped via map
single being unmapped as page.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 26 Jan 2010 18:27:09 +0000 (18:27 +0000)]
starfire: clean up properly if firmware loading fails
netdev_open() will return without cleaning up net device or hardware state
if firmware loading fails. This results in a BUG() on a second attempt to
bring the interface up, reported in
<http://bugzilla.kernel.org/show_bug.cgi?id=15091>, and probably has even
worse effects if the driver is removed afterwards.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Reported-by: Michael Moffatt <michael@moffatt.org.nz> Tested-by: Michael Moffatt <michael@moffatt.org.nz> Cc: "David S. Miller" <davem@davemloft.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yi [Tue, 26 Jan 2010 07:58:57 +0000 (15:58 +0800)]
mac80211: fix NULL pointer dereference when ftrace is enabled
I got below kernel oops when I try to bring down the network interface if
ftrace is enabled. The root cause is drv_ampdu_action() is passed with a
NULL ssn pointer in the BA session tear down case. We need to check and
avoid dereferencing it in trace entry assignment.
Cc: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Shan Wei [Tue, 26 Jan 2010 02:40:38 +0000 (02:40 +0000)]
ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure
The commit 0b5ccb2(title:ipv6: reassembly: use seperate reassembly queues for
conntrack and local delivery) has broken the saddr&&daddr member of
nf_ct_frag6_queue when creating new queue. And then hash value
generated by nf_hashfn() was not equal with that generated by fq_find().
So, a new received fragment can't be inserted to right queue.
The patch fixes the bug with adding member of user to nf_ct_frag6_queue structure.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 25 Jan 2010 23:51:01 +0000 (15:51 -0800)]
virtio_net: Make delayed refill more reliable
I have seen RX stalls on a machine that experienced a suspected
OOM. After the stall, the RX buffer is empty on the guest side
and there are exactly 16 entries available on the host side. As
the number of entries is less than that required by a maximal
skb, the host cannot proceed.
The guest did not have a refill job scheduled.
My diagnosis is that an OOM had occured, with the delayed refill
job scheduled. The job was able to allocate at least one skb, but
not enough to overcome the minimum required by the host to proceed.
As the refill job would only reschedule itself if it failed completely
to allocate any skbs, this would lead to an RX stall.
The following patch removes this stall possibility by always
rescheduling the refill job until the ring is totally refilled.
Testing has shown that the RX stall no longer occurs whereas
previously it would occur within a day.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 25 Jan 2010 23:49:59 +0000 (15:49 -0800)]
sfc: Use fixed-size buffers for MCDI NVRAM requests
The low-level MCDI code always uses 32-bit MMIO operations, and
callers must pad input and output buffers to multiples of 4 bytes.
The MCDI NVRAM functions are not doing this. Also, their buffers are
declared as variable-length arrays with no explicit maximum length.
Switch to a fixed buffer size based on the chunk size used by the
MTD driver (which is a multiple of 4).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guido Barzini [Mon, 25 Jan 2010 23:49:19 +0000 (15:49 -0800)]
sfc: Add workspace for GMAC bug workaround to MCDI MAC_STATS buffer
Due to a hardware bug in the SFC9000 family, the firmware must
transfer raw GMAC statistics to host memory before aggregating them
into the cooked (speed-independent) MAC statistics. Extend the stats
buffer to support this.
The length of the buffer is explicit in the MAC_STATS command, so this
change is backward-compatible on both sides.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Mon, 25 Jan 2010 23:47:50 +0000 (15:47 -0800)]
tcp_probe: avoid modulus operation and wrap fix
By rounding up the buffer size to power of 2, several expensive
modulus operations can be avoided. This patch also solves a bug where
the gap need when ring gets full was not being accounted for.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
changed the hw attach code to fix up initialization values only for
dual band devices, however the commit message did not give a reason as
to why this would be useful or necessary.
According to tests by Jorge Boncompte, this breaks at least some
2GHz-only cards, so the code should be changed back to the
unconditional INI fixup.
Signed-off-by: Felix Fietkau <nbd@openwrt.org> Reported-by: Jorge Boncompte <jorge@dti2.net> Cc: stable@kernel.org Tested-by: Pavel Roskin <proski@gnu.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Fri, 22 Jan 2010 22:22:34 +0000 (14:22 -0800)]
iwlwifi: fix pointer signedness warning
There are a few station addresses that are
char *, instead of the normal u8 *; gcc
gives pointer signedness warnings for some
of those, so use u8 * consistently.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Alexey Dobriyan [Mon, 25 Jan 2010 06:47:53 +0000 (22:47 -0800)]
netns xfrm: deal with dst entries in netns
GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.
Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.
Reorder GC threshold initialization so it'd be done before registering
XFRM policies.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Sun, 24 Jan 2010 18:46:06 +0000 (18:46 +0000)]
sky2: revert config space change
Obviously, this register had some other impact that is causing
the regression. Either it is masking some other access or needs
to be reset in some path.
Either, way it is best to just revert the change for 2.6.33
"ip xfrm state|policy count" report SA/SP count from init_net,
not from netns of caller process.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike McCormack [Sat, 23 Jan 2010 10:09:26 +0000 (02:09 -0800)]
sky2: Enable/disable WOL per hardware device
Y2_HW_WOL_ON/Y2_HW_WOL_OFF should be set and cleared per chip,
not per port. On dual port cards, Y2_HW_WOL_ON should be
enabled if either sky2 port has WOL enabled.
Found while reviewing code for a WOL regression, though this is
probably not the cause of the regression.
Signed-off-by: Mike McCormack <mikem@ring3k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sridhar Samudrala [Sat, 23 Jan 2010 10:02:21 +0000 (02:02 -0800)]
net: Fix IPv6 GSO type checks in Intel ethernet drivers
Found this problem when testing IPv6 from a KVM guest to a remote
host via e1000e device on the host.
The following patch fixes the check for IPv6 GSO packet in Intel
ethernet drivers to use skb_is_gso_v6(). SKB_GSO_DODGY is also set
when packets are forwarded from a guest.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Sat, 23 Jan 2010 09:35:00 +0000 (01:35 -0800)]
igb/igbvf: cleanup exception handling in tx_map_adv
After removing the skb_dma_map/unmap calls the exception handling in
igb_tx_map_adv is not correct. The issue is that the count value was not
being correctly handled so as a result we were not rewinding the ring as
back as we should have been.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Fri, 22 Jan 2010 22:56:16 +0000 (22:56 +0000)]
e1000/e1000e: don't use small hardware rx buffers
When testing the "e1000: enhance frame fragment detection" (and e1000e)
patches we found some bugs with reducing the MTU size. The 1024 byte
descriptor used with the 1000 mtu test also (re) introduced the
(originally) reported bug, and causes us to need the e1000_clean_tx_irq
"enhance frame fragment detection" fix.
So what has occured here is that 2.6.32 is only vulnerable for mtu <
1500 due to the jumbo specific routines in both e1000 and e1000e.
So, 2.6.32 needs the 2kB buffer len fix for those smaller MTUs, but
is not vulnerable to the original issue reported. It has been pointed
out that this vulnerability needs to be patched in older kernels that
don't have the e1000 jumbo routine. Without the jumbo routines, we
need the "enhance frame fragment detection" fix the e1000, old
e1000e is only vulnerable for < 1500 mtu, and needs a similar
fix. We split the patches up to provide easy backport paths.
There is only a slight bit of extra code when this fix and the
original "enhance frame fragment detection" fixes are applied, so
please apply both, even though it is a bit of overkill.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 21 Jan 2010 22:51:36 +0000 (22:51 +0000)]
be2net: swap only first 2 fields of mcc_wrb
Only the first two fields of mcc wrb - embedded, payload_len
need to be cpu_to_le32() swapped while issuing a cmd to the hw.
The fields tag0, tag1 are opaque and returned back to cpu as is...
Signed-off-by: Sathya Perla <sathyap@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ron Murray [Tue, 19 Jan 2010 08:02:48 +0000 (08:02 +0000)]
Please add support for Microsoft MN-120 PCMCIA network card
Please add support for Microsoft MN-120 PCMCIA network card. It's an
old card, I know, but adding support is very easy. You just need to
get tulip_core.c to recognise its vendor/device ID.
Patch for kernel 2.6.32.4 (and many previous) attached.
.....Ron Murray
Signed-off-by: Ron Murray <rjmx@rjmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Fri, 22 Jan 2010 06:52:08 +0000 (22:52 -0800)]
be2net: fix bug in rx page posting
Pages are posted to the rxq in such a way that more than one frag
can share the page. The last frag that uses the page unmaps the
page. In the case when a page is not fully used (due to lack of space in rxq)
the last frag that uses the page is not being set as a "last_page_user";
instead, the next frag in the rxq is incorrectly being set.
The fix has also been tested on ppc64 with 64k pages...
Signed-off-by: Sathya Perla <sathyap@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Inaky Perez-Gonzalez [Wed, 20 Jan 2010 20:41:13 +0000 (12:41 -0800)]
wimax/i2400m: Add support for more i6x50 SKUs
The Intel WiMax Wireless Link 6050 can show under more than one USB
ID. Add support for all, introducing a generic flag (i2400mu->i6050)
that denotes a 6x50 based device.
Linus Torvalds [Thu, 21 Jan 2010 16:50:04 +0000 (08:50 -0800)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: x86: Add support for the ANY bit
perf: Change the is_software_event() definition
perf: Honour event state for aux stream data
perf: Fix perf_event_do_pending() fallback callsite
perf kmem: Print usage help for unknown commands
perf kmem: Increase "Hit" column length
hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change
perf timechart: Use tid not pid for COMM change
Linus Torvalds [Thu, 21 Jan 2010 16:49:52 +0000 (08:49 -0800)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Reassign prev and switch_count when reacquire_kernel_lock() fail
sched: Fix vmark regression on big machines
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
tty: fix race in tty_fasync
serial: serial_cs: oxsemi quirk breaks resume
serial: imx: bit &/| confusion
serial: Fix crash if the minimum rate of the device is > 9600 baud
serial-core: resume serial hardware with no_console_suspend
serial: 8250_pnp: use wildcard for serial Wacom tablets
nozomi: quick fix for the close/close bug
compat_ioctl: Supress "unknown cmd" message on serial /dev/console
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: isp1362: fix build failure on ARM systems via irq_flags cleanup
USB: isp1362: better 64bit printf warning fixes
USB: fix usbstorage for 2770:915d delivers no FAT
USB: Fix level of isp1760 Reloading ptd error message
USB: FHCI: avoid NULL pointer dereference
USB: Fix duplicate sysfs problem after device reset.
USB: add speed values for USB 3.0 and wireless controllers
USB: add missing delay during remote wakeup
USB: EHCI & UHCI: fix race between root-hub suspend and port resume
USB: EHCI: fix handling of unusual interrupt intervals
USB: Don't use GFP_KERNEL while we cannot reset a storage device
USB: fix bitmask merge error
usb: serial: fix memory leak in generic driver
USB: serial: fix USB serial fix kfifo_len locking
Linus Torvalds [Thu, 21 Jan 2010 15:32:11 +0000 (07:32 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
fs/bio.c: fix shadows sparse warning
drbd: The kernel code is now equivalent to out of tree release 8.3.7
drbd: Allow online resizing of DRBD devices while peer not reachable (needs to be explicitly forced)
drbd: Don't go into StandAlone mode when authentification failes because of network error
drivers/block/drbd/drbd_receiver.c: correct NULL test
cfq-iosched: Respect ioprio_class when preempting
genhd: overlapping variable definition
block: removed unused as_io_context
DM: Fix device mapper topology stacking
block: bdev_stack_limits wrapper
block: Fix discard alignment calculation and printing
block: Correct handling of bottom device misaligment
drbd: check on CONFIG_LBDAF, not LBD
drivers/block/drbd: Correct NULL test
drbd: Silenced an assert that could triggered after changing write ordering method
drbd: Kconfig fix
drbd: Fix for a race between IO and a detach operation [Bugz 262]
drbd: Use drbd_crypto_is_hash() instead of an open coded check
Linus Torvalds [Thu, 21 Jan 2010 15:29:36 +0000 (07:29 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (23 commits)
ACPI: delete acpi_processor_power_verify_c2()
ACPI: allow C3 > 1000usec
ACPI: enable C2 and Turbo-mode on Nehalem notebooks on A/C
ACPI: power_meter: remove double kfree()
ACPI: processor: restrict early _PDC to opt-in platforms
ACPI: Fix unused variable warning in sbs.c
acpi: make ACPI device id constant
sony-laptop - fix using of uninitialized variable
ACPI: Fix section mismatch error for acpi_early_processor_set_pdc()
eeepc-laptop: disable wireless hotplug for 1201N
eeepc-laptop: add hotplug_disable parameter
eeepc-laptop: switch to using sparse keymap library
eeepc-laptop: dmi blacklist to disable pci hotplug code
eeepc-laptop: disable cpu speed control on EeePC 701
ACPI: don't cond_resched if irq is disabled
ACPI: Remove unnecessary cast.
ACPI: Advertise to BIOS in _OSC: _OST on _PPC changes
ACPI: EC: Add wait for irq storm
ACPI: SBS: Move SBS HC callback to faster Notify queue
x86, ACPI: delete acpi_boot_table_init() return value
...
Linus Torvalds [Thu, 21 Jan 2010 15:28:54 +0000 (07:28 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
ecryptfs: use after free
ecryptfs: Eliminate useless code
ecryptfs: fix interpose/interpolate typos in comments
ecryptfs: pass matching flags to interpose as defined and used there
ecryptfs: remove unnecessary d_drop calls in ecryptfs_link
ecryptfs: don't ignore return value from lock_rename
ecryptfs: initialize private persistent file before dereferencing pointer
eCryptfs: Remove mmap from directory operations
eCryptfs: Add getattr function
eCryptfs: Use notify_change for truncating lower inodes
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix possible panic on unmount
Btrfs: deal with NULL acl sent to btrfs_set_acl
Btrfs: fix regression in orphan cleanup
Btrfs: Fix race in btrfs_mark_extent_written
Btrfs, fix memory leaks in error paths
Btrfs: align offsets for btrfs_ordered_update_i_size
btrfs: fix missing last-entry in readdir(3)
Yongseok Koh [Tue, 19 Jan 2010 08:33:49 +0000 (17:33 +0900)]
vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE
In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
then vmap_lazy_nr is increased atomically.
But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
is counted by checking VM_LAZY_FREE is set to va->flags. After counting
the variable nr, kernel reads vmap_lazy_nr atomically and checks a
BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
vmap_lazy_nr from being negative.
The problem is that, if interrupted right after marking VM_LAZY_FREE,
increment of vmap_lazy_nr can be delayed. Consequently, BUG_ON
condition can be met because nr is counted more than vmap_lazy_nr.
It is highly probable when vmalloc/vfree are called frequently. This
scenario have been verified by adding delay between marking VM_LAZY_FREE
and increasing vmap_lazy_nr in free_unmap_area_noflush().
Even the vmap_lazy_nr is for checking high watermark, it never be the
strict watermark. Although the BUG_ON condition is to prevent
vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable. So,
it could go down to negative value temporarily.
Consequently, removing the BUG_ON condition is proper.
Linus Torvalds [Thu, 21 Jan 2010 15:15:10 +0000 (07:15 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 5888/1: arm: Update comments in cacheflush.h and remove unnecessary V6 and V7 comments
ARM: 5886/1: arm: Fix cpu_proc_fin() for proc-v7.S and make kexec work
ARM: 5885/1: arm: Flush TLB entries in setup_mm_for_reboot()
ARM: 5884/1: arm: Fix DCC console for v7
ARM: 5883/1: Revert "disable NX support for OABI-supporting kernels"
ARM: 5882/1: ARM: Fix uncompress code compile for different defines of flush(void)
ARM: fix badly placed mach/plat entries in Kconfig & Makefile
Peter Zijlstra [Mon, 18 Jan 2010 08:12:32 +0000 (09:12 +0100)]
perf: Honour event state for aux stream data
Anton reported that perf record kept receiving events even after calling
ioctl(PERF_EVENT_IOC_DISABLE). It turns out that FORK,COMM and MMAP
events didn't respect the disabled state and kept flowing in.
Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Anton Blanchard <anton@samba.org>
LKML-Reference: <1263459187.4244.265.camel@laptop> CC: stable@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul questioned the context in which we should call
perf_event_do_pending(). After looking at that I found that it should be
called from IRQ context these days, however the fallback call-site is
placed in softirq context. Ammend this by placing the callback in the IRQ
timer path.
Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263374859.4244.192.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yong Zhang [Mon, 11 Jan 2010 06:21:25 +0000 (14:21 +0800)]
sched: Reassign prev and switch_count when reacquire_kernel_lock() fail
Assume A->B schedule is processing, if B have acquired BKL before and it
need reschedule this time. Then on B's context, it will go to
need_resched_nonpreemptible for reschedule. But at this time, prev and
switch_count are related to A. It's wrong and will lead to incorrect
scheduler statistics.
Mike Galbraith [Mon, 4 Jan 2010 13:44:56 +0000 (14:44 +0100)]
sched: Fix vmark regression on big machines
SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to. Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.
Reported-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1262612696.15495.15.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jesse Brandeburg [Tue, 19 Jan 2010 14:15:59 +0000 (14:15 +0000)]
e1000e: enhance frame fragment detection
Originally patched by Neil Horman <nhorman@tuxdriver.com>
e1000e could with a jumbo frame enabled interface, and packet split disabled,
receive a packet that would overflow a single rx buffer. While in practice
very hard to craft a packet that could abuse this, it is possible.
this is related to CVE-2009-4538
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Tue, 19 Jan 2010 14:15:38 +0000 (14:15 +0000)]
e1000: enhance frame fragment detection
Originally From: Neil Horman <nhorman@tuxdriver.com>
Modified by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Hey all-
A security discussion was recently given:
http://events.ccc.de/congress/2009/Fahrplan//events/3596.en.html
And a patch that I submitted awhile back was brought up. Apparently some of
their testing revealed that they were able to force a buffer fragment in e1000
in which the trailing fragment was greater than 4 bytes. As a result the
fragment check I introduced failed to detect the fragement and a partial
invalid frame was passed up into the network stack. I've written this patch
to correct it. I'm in the process of testing it now, but it makes good
logical sense to me. Effectively it maintains a per-adapter state variable
which detects a non-EOP frame, and discards it and subsequent non-EOP frames
leading up to _and_ _including_ the next positive-EOP frame (as it is by
definition the last fragment). This should prevent any and all partial frames
from entering the network stack from e1000.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roel Kluin [Tue, 19 Jan 2010 14:21:45 +0000 (14:21 +0000)]
e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()
The variable count and i are unsigned so the (<|>=)0 tests do not work.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>