Dave Kleikamp [Thu, 12 Jan 2012 23:32:13 +0000 (17:32 -0600)]
ocfs2: add support for read_iter, write_iter, and direct_IO_bvec
ocfs2's .aio_read and .aio_write methods are changed to take
iov_iter and pass it to generic functions. Wrappers are made to pack
the iovecs into iters and call these new functions.
ocfs2_direct_IO() is trivial enough that a new function is made which
passes the bvec down to the generic direct path.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Zach Brown [Fri, 22 Oct 2010 19:24:24 +0000 (12:24 -0700)]
aio: add aio support for iov_iter arguments
This adds iocb cmds which specify that memory is held in iov_iter
structures. This lets kernel callers specify memory that can be
expressed in an iov_iter, which includes pages in bio_vec arrays.
Only kernel callers can provide an iov_iter so it doesn't make a lot of
sense to expose the IOCB_CMD values for this as part of the user space
ABI.
But kernel callers should also be able to perform the usual aio
operations which suggests using the the existing operation namespace and
support code.
Dave Kleikamp [Thu, 12 Jan 2012 21:13:40 +0000 (15:13 -0600)]
aio: add aio_kernel_() interface
This adds an interface that lets kernel callers submit aio iocbs without
going through the user space syscalls. This lets kernel callers avoid
the management limits and overhead of the context. It will also let us
integrate aio operations with other kernel apis that the user space
interface doesn't have access to.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Dave Kleikamp [Thu, 12 Jan 2012 20:55:02 +0000 (14:55 -0600)]
fs: pull iov_iter use higher up the stack
Right now only callers of generic_perform_write() pack their iovec
arguments into an iov_iter structure. All the callers higher up in the
stack work on raw iovec arguments.
This patch introduces the use of the iov_iter abstraction higher up the
stack. Private generic path functions are changed to operation on
iov_iter instead of on raw iovecs. Exported interfaces that take iovecs
immediately pack their arguments into an iov_iter and call into the
shared functions.
File operation struct functions are added with iov_iter as an argument
so that callers to the generic file system functions can specify
abstract memory rather than iovec arrays only.
Almost all of this patch only transforms arguments and shouldn't change
functionality. The buffered read path is the exception. We add a
read_actor function which uses the iov_iter helper functions instead of
operating on each individual iovec element. This may improve
performance as the iov_iter helper can copy multiple iovec elements from
one mapped page cache page.
As always, the direct IO path is special. Sadly, it may still be
cleanest to have it work on the underlying memory structures directly
instead of working through the iov_iter abstraction.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Dave Kleikamp [Thu, 12 Jan 2012 22:45:44 +0000 (16:45 -0600)]
dio: add __blockdev_direct_IO_bdev()
Previous patches refactored __blockdev_direct_IO() to call helper
functions while iterating over the user's iovec. This adds a
__blockdev_direct_IO() which is the same except that it iterates over
the pages in a bio_vec instead of user addresses in an iovec.
The trick here is to initialize the dio state so that do_direct_IO()
consumes the pages we provide and never tries to map user pages. This
is done by making sure that final_block_in_request covers the page that
we set in the dio. do_direct_IO() will return before running out of
pages.
The caller is responsible for dirtying these pages, if needed. We add
an option to the dio struct that makes sure we only dirty pages when
we're operating on iovecs of user addresses.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Dave Kleikamp [Wed, 11 Jan 2012 22:45:15 +0000 (16:45 -0600)]
dio: add dio_post_submission() helper function
This creates a function that contains all the code that is executed
after IO is submitted. It takes code from the end of
do_blockdev_direct_IO(). This will be called by another entry point
that will be added in an upcoming patch.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Dave Kleikamp [Wed, 11 Jan 2012 21:25:26 +0000 (15:25 -0600)]
dio: add dio_lock_and_flush() helper
This creates a helper function which performs locking based on DIO_LOCKING
and flushes dirty pages. This will be called by another entry point
like __blockdev_direct_IO() in an upcoming patch.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Dave Kleikamp [Wed, 11 Jan 2012 21:11:29 +0000 (15:11 -0600)]
dio: add dio_alloc_init() helper function
This adds a helper function which allocates and initializes the dio
structure. We'll be calling this from another entry point like
__blockdev_direct_IO() in an upcoming patch.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Zach Brown [Fri, 3 Sep 2010 21:17:00 +0000 (14:17 -0700)]
dio: create a dio_aligned() helper function
__blockdev_direct_IO() had two instances of the same code to determine
if a given offset wasn't aligned first to the inode's blkbits and then
to the underlying device's blkbits. This was confusing enough but
we're about to add code that performs the same check on offsets in bvec
arrays. Rather than add yet more copies of this code let's have
everyone call a helper.
Zach Brown [Fri, 3 Sep 2010 21:12:09 +0000 (14:12 -0700)]
iov_iter: let callers extract iovecs and bio_vecs
direct IO treats memory from user iovecs and memory from arrays of
kernel pages very differently. User memory is pinned and worked with in
batches while kernel pages are always pinned and don't require
additional processing.
Rather than try and provide an absctraction that includes these different
behaviours we let direct IO extract the memory structs and hand them to the
existing code.
Zach Brown [Fri, 3 Sep 2010 18:37:58 +0000 (11:37 -0700)]
iov_iter: add a shorten call
The generic direct write path wants to shorten its memory vector. It
does this when it finds that it has to perform a partial write due to
LIMIT_FSIZE. .direct_IO() always performs IO on all of the referenced
memory because it doesn't have an argument to specify the length of the
IO.
We add an iov_iter operation for this so that the generic path can ask
to shorten the memory vector without having to know what kind it is.
We're happy to shorten the kernel copy of the iovec array, but we refuse
to shorten the bio_vec array and return an error in this case.
Zach Brown [Fri, 3 Sep 2010 18:37:58 +0000 (11:37 -0700)]
iov_iter: add bvec support
This adds a set of iov_iter_ops calls which work with memory which is
specified by an array of bio_vec structs instead of an array of iovec structs.
The big difference is that the pages referenced by the bio_vec elements are
pinned. They don't need to be faulted in and we can always use kmap_atomic()
to map them one at a time.
Zach Brown [Fri, 3 Sep 2010 20:52:28 +0000 (13:52 -0700)]
iov_iter: hide iovec details behind ops function pointers
This moves the current iov_iter functions behind an ops struct of function
pointers. The current iov_iter functions all work with memory which is
specified by iovec arrays of user space pointers.
This patch is part of a series that lets us specify memory with bio_vec arrays
of page pointers. By moving to an iov_iter operation struct we can add that
support in later patches in this series by adding another set of function
pointers.
I only came to this after having initialy tried to teach the current iov_iter
functions about bio_vecs by introducing conditional branches that dealt with
bio_vecs in all the functions. It wasn't pretty. This approach seems to be
the lesser evil.
Zach Brown [Fri, 3 Sep 2010 18:42:24 +0000 (11:42 -0700)]
iov_iter: move into its own file
This moves the iov_iter functions in to their own file. We're going to be
working on them in upcoming patches. They become sufficiently large, and
remain self-contained, to justify seperating them from the rest of the huge
mm/filemap.c.
Manish Rangankar [Fri, 2 Dec 2011 08:25:03 +0000 (13:55 +0530)]
qla4xxx: Fixed BFS with sendtargets as boot index.
If ql4xdisablesysfsboot = 0 and sendtargets entry as boot index then
driver does export sendtarget entries in sysfs but iscsistart does not
do discovery. So in this case let driver do the discovery and
login to the targets.
Nilesh Javali [Wed, 7 Dec 2011 08:22:31 +0000 (13:52 +0530)]
qla4xxx: Correct the default relogin timeout value
The ACB default timeout value is used to set the default
relogin timeout value. For ISP4022 adapters where
the ACB default value is set to 2560s, limit the relogin
timeout to 12s.
JIRA Key: IUEKR2ISCSI-8
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Nilesh Javali [Thu, 1 Dec 2011 09:06:04 +0000 (14:36 +0530)]
qla4xxx: Limit the ACB Default Timeout value to 12s
The ACB default timeout value is set to 2560s in the
ISP4022 firmware. This caused the driver to loop
for 2560s. Hence limit the default timeout at the driver
level to min 12s.
Also break out from the loop if the sendtargets list was empty.
JIRA Key: IUEKR2ISCSI-7
Signed-off-by: Nilesh Javali <nilesh.javali@qlogic.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
watchdog: hpwdt: Changes to handle NX secure bit in 32bit path
commit e67d668e147c3b4fec638c9e0ace04319f5ceccd upstream.
This patch makes use of the set_memory_x() kernel API in order
to make necessary BIOS calls to source NMIs.
This is needed for SLES11 SP2 and the latest upstream kernel as it appears
the NX Execute Disable has grown in its control.
Signed-off by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Chuck Anderson [Sat, 7 Jan 2012 00:12:49 +0000 (16:12 -0800)]
Partial revert of mainline removal of deprecated sysfs interface for 13568528
Jan. 06, 2012
Oracle bug 13568528
Patch written by Andrew Thomas
Ported by Chuck Anderson
This patch partialy reverts the removal in mainline of a deprecated sysfs
interface needed by the OVM3.0.4 UEK2 based dom0 kernel when it is used
to install OVM3.0.4
Comments from Andrew:
The problem is that in newer kernels, even with the
CONFIG_SYSFS_DEPRECATED[_V2] flags set, some nodes have been removed so to
tools looking in sysfs, pieces are missing. This breaks anaconda (actually
kudzu) for us. For OVM3 we use the dom0 kernel as the install kernel, so we
need UEK2 to provide the right "shape" sysfs. This isn't an issue for OL
because you use the old RHEL kernel to install UEK1/2]. That said, this
issue affects more than us. As Joe Jin points out, bug 13100678, required
kudzu fixes for eth devices. Arguably the OVM3 anaconda issue can also be
fixed in kudzu, but what no one knows is if the missing sysfs nodes are
symptoms of a wider set of tools related problems and therefore whether the
correct fix is to revert sysfs changes in UEK2 so that the sysfs it presents
is isomorphic to what 2.6.18 based kernels provide.
A "better" set of tools would be from 6uX, but in order to get those
installed/upgraded on OVM3 is not a trivial task because the system customers
have already installed is 5u5 based. We have other tools in dom0 [eg our
agent] which "know" about the old flavour of sysfs and these would need
porting. You either change the kernel OR you change all the tools that rely
on sysfs... the problem is that customers can install there own tools on OL5.
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Nelson Elhage [Tue, 10 Jan 2012 23:04:08 +0000 (15:04 -0800)]
Let KERNEL_VERSION be 3.0.x, and override UTSNAME
This will let out-of-tree modules correctly detect the kernel version
when building against it, but it will still identify as 2.6.39
everywhere in userspace.
Signed-off-by: Nelson Elhage <nelson.elhage@oracle.com> Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>
Vikas Chaudhary [Fri, 2 Dec 2011 06:42:12 +0000 (22:42 -0800)]
qla4xxx: Fix qla4xxx_dump_buffer to dump buffer correctly
KERN_INFO in printk adding new line character that mess-up
dump print format. Remove KERN_INFO to fix dump format.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Vikas Chaudhary [Fri, 2 Dec 2011 06:42:10 +0000 (22:42 -0800)]
qla4xxx: Wait for disable_acb before doing set_acb
In function qla4xxx_iface_set_param wait for disable_acb to
complete so that set_acb will not fail.
Jira Key: IUEKR2ISCSI-5
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Sarang Radke [Tue, 6 Dec 2011 10:34:10 +0000 (02:34 -0800)]
qla4xxx: fix call trace on rmmod with ql4xdontresethba=1
abort all active commands from eh_host_reset in-case
of ql4xdontresethba=1
Fix following call trace:-
Nov 21 14:50:47 172.17.140.111 qla4xxx 0000:13:00.4: qla4_8xxx_disable_msix: qla4xxx (rsp_q)
Nov 21 14:50:47 172.17.140.111 qla4xxx 0000:13:00.4: PCI INT A disabled
Nov 21 14:50:47 172.17.140.111 slab error in kmem_cache_destroy(): cache `qla4xxx_srbs': Can't free all objects
Nov 21 14:50:47 172.17.140.111 Pid: 9154, comm: rmmod Tainted: G O 3.2.0-rc2+ #2
Nov 21 14:50:47 172.17.140.111 Call Trace:
Nov 21 14:50:47 172.17.140.111 [<c051231a>] ? kmem_cache_destroy+0x9a/0xb0
Nov 21 14:50:47 172.17.140.111 [<c0489c4a>] ? sys_delete_module+0x14a/0x210
Nov 21 14:50:47 172.17.140.111 [<c04fd552>] ? do_munmap+0x202/0x280
Nov 21 14:50:47 172.17.140.111 [<c04a6d4e>] ? audit_syscall_entry+0x1ae/0x1d0
Nov 21 14:50:47 172.17.140.111 [<c083019f>] ? sysenter_do_call+0x12/0x28
Nov 21 14:51:50 172.17.140.111 SLAB: cache with size 64 has lost its name
Nov 21 14:51:50 172.17.140.111 iscsi: registered transport (qla4xxx)
Nov 21 14:51:50 172.17.140.111 qla4xxx 0000:13:00.4: PCI INT A -> GSI 28 (level, low) -> IRQ 28
Jira Key: IUEKR2ISCSI-3
Signed-off-by: Sarang Radke <sarang.radke@qlogic.com> Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Mike Hernandez [Fri, 2 Dec 2011 06:42:07 +0000 (22:42 -0800)]
qla4xxx: Fix CPU lockups when ql4xdontresethba set
Fix issue where CPU lockup is seen when ql4xdontresethba is set and
driver is "stuck" in NEED_RESET state handler.
Jira Key: IUEKR2ISCSI-2
Signed-off-by: Mike Hernandez <michael.hernandez@qlogic.com> Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Vikas Chaudhary [Fri, 2 Dec 2011 06:42:06 +0000 (22:42 -0800)]
qla4xxx: Perform context resets in case of context failures.
For 4032, context reset was the same as chip reset, and any firmware
issue was recovered by performing a chip reset.
For 82xx, the iSCSI firmware runs along with FCoE and the NIC
firmware contexts, and an error encountered doesnot essentially mean
that a chip reset is necessary.
Perform Chip resets only in the following cases:
1. Mailbox system error.
2. Mailbox command timeout.
3. fw_heartbeat_counter counter stops incrementing.
For all other cases, only perform a context reset.
1. Command Completion with an invalid srb.
2. Other mailbox failures.
Jira Key: IUEKR2ISCSI-1
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Signed-off-by: Shyam Sunder <shyam.sunder@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Tej Parkash <tej.parkash@qlogic.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
This reverts commit ddacf5ef684a655abe2bb50c4b2a5b72ae0d5e05.
As when booting the kernel under Amazon EC2 as an HVM guest it ends up
hanging during startup. Reverting this we loose the fix for kexec
booting to the crash kernels.
don't do aggregation related stuff for 'AP mode client power save
handling' if aggregation is not enabled in the driver, otherwise it
will lead to panic because those data structures won't be never
intialized in 'ath_tx_node_init' if aggregation is disabled
do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);
return 1;
}
It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().
The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.
This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.
I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.
It causes failures on Toshiba laptops - instead of disabling the alarm,
it actually seems to enable it on the affected laptops, resulting in
(for example) the laptop powering on automatically five minutes after
shutdown.
There's a patch for it that appears to work for at least some people,
but it's too late to play around with this, so revert for now and try
again in the next merge window.
See for example
http://bugs.debian.org/652869
Reported-and-bisected-by: Andreas Friedrich <afrie@gmx.net> (Toshiba Tecra) Reported-by: Antonio-M. Corbi Bellot <antonio.corbi@ua.es> (Toshiba Portege R500) Reported-by: Marco Santos <marco.santos@waynext.com> (Toshiba Portege Z830) Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> (Toshiba Portege R830) Cc: Jonathan Nieder <jrnieder@gmail.com> Requested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.
Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync(). Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.
The older_than_this checks that are now more strictly enforced since
by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once. But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened. I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.
Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty. This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path. This means any inode that is pinned
is skipped. With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it. The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds. This means under certain scenarious time based
metadata writeback never happens.
Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the twl4030-madc device wasn't registered, and another device, such
as twl4030-madc-hwmon, calls twl4030_madc_conversion() a NULL pointer is
dereferenced.
Signed-off-by: Kyle Manna <kyle@kylemanna.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since we configure all the queues as CHAINABLE, we need to update the
byte count for all the queues, not only the AGGREGATABLE ones.
Not doing so can confuse the SCD and make the fw assert.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer)
removed IP route cache garbage collector a bit too soon, as this gc was
responsible for expired routes cleanup, releasing their neighbour
reference.
As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
their neighbour cache.
Reintroduce the garbage collection, since we'll have to wait our
neighbour lookups become refcount-less to not depend on this stuff.
Reported-by: Robert Gladewitz <gladewitz@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0,
we should flush route cache, or it will continue receive packets with local
source address, which should be dropped.
Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.
The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.
When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.
Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.
The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.
Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for
limiting the autoclose value. If userspace passes in -1 on 32-bit
platform, the overflow check didn't work and autoclose would be set
to 0xffffffff.
This patch defines a max_autoclose (in seconds) for limiting the value
and exposes it through sysctl, with the following intentions.
1) Avoid overflowing autoclose * HZ.
2) Keep the default autoclose bound consistent across 32- and 64-bit
platforms (INT_MAX / HZ in this patch).
3) Keep the autoclose value consistent between setsockopt() and
getsockopt() calls.
Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
gred_change_vq() is called under sch_tree_lock(sch).
This means a spinlock is held, and we are not allowed to sleep in this
context.
We might pre-allocate memory using GFP_KERNEL before taking spinlock,
but this is not suitable for stable material.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Before waiting (predefined value 120s), check that at least one device
was successfully brought up. Otherwise (e.g. buggy bootloader
which does not set the MAC address) there is no point in waiting
for carrier.
Cc: Micha Nelissen <micha@neli.hopto.org> Cc: Holger Brunck <holger.brunck@keymile.com> Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.
Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Received non stream protocol packets were calling llc_cmsg_rcv that used a
skb after that skb was released by sk_eat_skb. This caused received STP
packets to generate kernel panics.
Signed-off-by: Alexandru Juncu <ajuncu@ixiacom.com> Signed-off-by: Kunjan Naik <knaik@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
x86 jump instruction size is 2 or 5 bytes (near/long jump), not 2 or 6
bytes.
In case a conditional jump is followed by a long jump, conditional jump
target is one byte past the start of target instruction.
Signed-off-by: Markus Kötter <nepenthesdev@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Although we provide a proper way for a debugger to control whether
syscall restart occurs, we run into problems because orig_i0 is not
saved and restored properly.
Luckily we can solve this problem without having to make debuggers
aware of the issue. Across system calls, several registers are
considered volatile and can be safely clobbered.
Therefore we use the pt_regs save area of one of those registers, %g6,
as a place to save and restore orig_i0.
Debuggers transparently will do the right thing because they save and
restore this register already.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The "(insn & 0x01800000) != 0x01800000" test matches 'restore'
but that is a legitimate place to see the %lo() part of a 32-bit
symbol relocation, particularly in tail calls.
Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This silently was working for many years and stopped working on
Niagara-T3 machines.
We need to set the MSIQ to VALID before we can set it's state to IDLE.
On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
errors. The hypervisor documentation says, rather ambiguously, that
the MSIQ must be "initialized" before one can set the state.
I previously understood this to mean merely that a successful setconf()
operation has been performed on the MSIQ, which we have done at this
point. But it seems to also mean that it has been set VALID too.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We already do this for cayman, need to also do it for
BTC parts. The default memory and voltage setup is not
adequate for advanced operation. Continuing will
result in an unusable display.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.
While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?
In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.
But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
This change fixes a linking problem, which happens if oprofile
is selected to be compiled as built-in:
`oprofile_arch_exit' referenced in section `.init.text' of
arch/arm/oprofile/built-in.o: defined in discarded section
`.exit.text' of arch/arm/oprofile/built-in.o
The problem is appeared after commit 87121ca504, which
introduced oprofile_arch_exit() calls from __init function. Note
that the aforementioned commit has been backported to stable
branches, and the problem is known to be reproduced at least
with 3.0.13 and 3.1.5 kernels.
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@nokia.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: oprofile-list <oprofile-list@lists.sourceforge.net> Link: http://lkml.kernel.org/r/20111222151540.GB16765@erda.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, the *_global_[un]lock_online() routines are not at all synchronized
with CPU hotplug. Soft-lockups detected as a consequence of this race was
reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
for finding out that the root-cause of this issue is the race condition
between br_write_[un]lock() and CPU hotplug, which results in the lock states
getting messed up).
Fixing this race by just adding {get,put}_online_cpus() at appropriate places
in *_global_[un]lock_online() is not a good option, because, then suddenly
br_write_[un]lock() would become blocking, whereas they have been kept as
non-blocking all this time, and we would want to keep them that way.
So, overall, we want to ensure 3 things:
1. br_write_lock() and br_write_unlock() must remain as non-blocking.
2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
for different sets of CPUs.
3. Either prevent any new CPU online operation in between this lock-unlock, or
ensure that the newly onlined CPU does not proceed with its corresponding
per-cpu spinlock unlocked.
To achieve all this:
(a) We introduce a new spinlock that is taken by the *_global_lock_online()
routine and released by the *_global_unlock_online() routine.
(b) We register a callback for CPU hotplug notifications, and this callback
takes the same spinlock as above.
(c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
initialized in the lock_init() code, all future updates to it are done in
the callback, under the above spinlock.
(d) The above bitmap is used (instead of cpu_online_mask) while locking and
unlocking the per-cpu locks.
The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
br_write_lock-unlock sequence is in progress, the callback keeps spinning,
thus preventing the CPU online operation till the lock-unlock sequence is
complete. This takes care of requirement (3).
The bitmap that we maintain remains unmodified throughout the lock-unlock
sequence, since all updates to it are managed by the callback, which takes
the same spinlock as the one taken by the lock code and released only by the
unlock routine. Combining this with (d) above, satisfies requirement (2).
Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
operations from racing with br_write_lock-unlock, requirement (1) is also
taken care of.
By the way, it is to be noted that a CPU offline operation can actually run
in parallel with our lock-unlock sequence, because our callback doesn't react
to notifications earlier than CPU_DEAD (in order to maintain our bitmap
properly). And this means, since we use our own bitmap (which is stale, on
purpose) during the lock-unlock sequence, we could end up unlocking the
per-cpu lock of an offline CPU (because we had locked it earlier, when the
CPU was online), in order to satisfy requirement (2). But this is harmless,
though it looks a bit awkward.
Debugged-by: Cong Meng <mc@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix the case of HT40 after association on specified AP, but it break the
association for some APs and cause not able to establish connection.
We need to address HT40 before and after addociation.
Reported-by: Andrej Gelenberg <andrej.gelenberg@udo.edu> Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Tested-by: Andrej Gelenberg <andrej.gelenberg@udo.edu> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Check the IEEE80211_TX_CTL_ASSIGN_SEQ flag from mac80211, then decide how to
set the TX_CMD_FLG_SEQ_CTL_MSK bit. Setting the wrong bit in BAR frame whill
make the firmware to increment the sequence number which is incorrect and
cause unknown behavior.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The stations always chooses 1Mbps for all trasmitting frames,
whenever the AP is configured to lock the supported rates.
As the max phy rate is always set with the 4th from highest phy rate,
this assumption might be wrong if we have less than that. Fix that.
Cc: Paul Stewart <pstew@google.com> Reported-by: Ajay Gummalla <agummalla@google.com> Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>