]> www.infradead.org Git - users/hch/misc.git/log
users/hch/misc.git
18 years ago9p: soften invalidation in loose_mode
Eric Van Hensbergen [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: soften invalidation in loose_mode

Loose mode in 9p utilizes the page cache without respecting coherency with
the server.  Any writes previously invaldiated the entire mapping for a file.
This patch softens the behavior to only invalidate the region of the actual
write.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
18 years ago9p: attach-per-user
Latchesar Ionkov [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: attach-per-user

The 9P2000 protocol requires the authentication and permission checks to be
done in the file server. For that reason every user that accesses the file
server tree has to authenticate and attach to the server separately.
Multiple users can share the same connection to the server.

Currently v9fs does a single attach and executes all I/O operations as a
single user. This makes using v9fs in multiuser environment unsafe as it
depends on the client doing the permission checking.

This patch improves the 9P2000 support by allowing every user to attach
separately. The patch defines three modes of access (new mount option
'access'):

- attach-per-user (access=user) (default mode for 9P2000.u)
 If a user tries to access a file served by v9fs for the first time, v9fs
 sends an attach command to the server (Tattach) specifying the user. If
 the attach succeeds, the user can access the v9fs tree.
 As there is no uname->uid (string->integer) mapping yet, this mode works
 only with the 9P2000.u dialect.

- allow only one user to access the tree (access=<uid>)
 Only the user with uid can access the v9fs tree. Other users that attempt
 to access it will get EPERM error.

- do all operations as a single user (access=any) (default for 9P2000)
 V9fs does a single attach and all operations are done as a single user.
 If this mode is selected, the v9fs behavior is identical with the current
 one.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
18 years ago9p: rename uid and gid parameters
Latchesar Ionkov [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: rename uid and gid parameters

Change the names of 'uid' and 'gid' parameters to the more appropriate
'dfltuid' and 'dfltgid'.  This also sets the default uid/gid to -2
(aka nfsnobody)

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
18 years ago9p: define session flags
Latchesar Ionkov [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: define session flags

Create more general flags field in the v9fs_session_info struct and move the
'extended' flag as a bit in the flags.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
18 years ago9p: Make transports dynamic
Eric Van Hensbergen [Wed, 17 Oct 2007 19:31:07 +0000 (14:31 -0500)]
9p: Make transports dynamic

This patch abstracts out the interfaces to underlying transports so that
new transports can be added as modules.  This should also allow kernel
configuration of transports without ifdef-hell.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
18 years ago[MIPS] IP22: Fix hang due to messing with timer interrupt handler
Thomas Bogendoerfer [Wed, 17 Oct 2007 17:15:17 +0000 (19:15 +0200)]
[MIPS] IP22: Fix hang due to messing with timer interrupt handler

As IP22 is now using do_IRQ for timer interrupt, don't mess with
interrupt handler any longer

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Sibyte: Fix typos in sibyte clockevent drivers
Atsushi Nemoto [Wed, 17 Oct 2007 15:57:07 +0000 (00:57 +0900)]
[MIPS] Sibyte: Fix typos in sibyte clockevent drivers

Fix some typo introduced on clockevent conversion.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Alchemy: Renumber interrupts so irq_cpu can work.
Ralf Baechle [Wed, 17 Oct 2007 14:38:30 +0000 (15:38 +0100)]
[MIPS] Alchemy: Renumber interrupts so irq_cpu can work.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Alchemy: replace last remaining instance of au_ffs with ffs.
Ralf Baechle [Wed, 17 Oct 2007 14:37:44 +0000 (15:37 +0100)]
[MIPS] Alchemy: replace last remaining instance of au_ffs with ffs.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Alchemy: Reformat PM code.
Ralf Baechle [Wed, 17 Oct 2007 14:36:53 +0000 (15:36 +0100)]
[MIPS] Alchemy: Reformat PM code.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Alchemy: Fix build by conversion to irq_cpu.c.
Ralf Baechle [Wed, 17 Oct 2007 09:58:43 +0000 (10:58 +0100)]
[MIPS] Alchemy: Fix build by conversion to irq_cpu.c.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] MTX1: Enable CONFIG_CROSSCOMPILE in defconfig.
Ralf Baechle [Wed, 17 Oct 2007 10:32:21 +0000 (11:32 +0100)]
[MIPS] MTX1: Enable CONFIG_CROSSCOMPILE in defconfig.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] Probe for usability of cp0 compare interrupt.
Ralf Baechle [Tue, 16 Oct 2007 22:20:48 +0000 (23:20 +0100)]
[MIPS] Probe for usability of cp0 compare interrupt.

Some processors offer the option of using the interrupt on which
normally the count / compare interrupt would be signaled as a normal
interupt pin.  Previously this required some ugly hackery for each
system which is much easier done by a quick and simple probe.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years ago[MIPS] SYNC emulation for MIPS I processors
Maciej W. Rozycki [Tue, 16 Oct 2007 17:43:26 +0000 (18:43 +0100)]
[MIPS] SYNC emulation for MIPS I processors

Userland, including the C library and the dynamic linker, is keen to use
the SYNC instruction, even for "generic" MIPS I binaries these days.
Which makes it less than useful on MIPS I processors.

This change adds the emulation, but as our do_ri() infrastructure was not
really prepared to take yet another instruction, I have rewritten it and
its callees slightly as follows.

Now there is only a single place a possible signal is thrown from.  The
place is at the end of do_ri().  The instruction word is fetched in
do_ri() and passed down to handlers.  The handlers are called in sequence
and return a result that lets the caller decide upon further processing.
If the result is positive, then the handler has picked the instruction,
but a signal should be thrown and the result is the signal number.  If the
result is zero, then the handler has successfully simulated the
instruction.  If the result is negative, then the handler did not handle
the instruction; to make it more obvious the calls do not follow the usual
0/-Exxx result convention they now return -1 instead of -EFAULT.

The calculation of the return EPC is now at the beginning.  The reason is
it is easier to handle it there as emulation callees may modify a register
and an instruction may be located in delay slot of a branch whose result
depends on the register.  It has to be undone if a signal is to be raised,
but it is not a problem as this is the slow-path case, and both actions
are done in single places now rather than the former being scattered
through emulation handlers.

The part of do_cpu() being covered follows the changes to do_ri().

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---

18 years ago[MIPS] Fix modpost warning in raw binary builds.
Ralf Baechle [Tue, 16 Oct 2007 19:05:18 +0000 (20:05 +0100)]
[MIPS] Fix modpost warning in raw binary builds.

  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x478): Section mismatch: reference to .init.text:start_kernel (between '_stext' and 'run_init_process')

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
18 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
Linus Torvalds [Wed, 17 Oct 2007 16:11:18 +0000 (09:11 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  sched: fix new task startup crash
  sched: fix !SYSFS build breakage
  sched: fix improper load balance across sched domain
  sched: more robust sd-sysctl entry freeing

18 years agoMerge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
Linus Torvalds [Wed, 17 Oct 2007 16:08:13 +0000 (09:08 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
  [SCSI] Remove full sg table memset()
  [SCSI] ide-scsi: remove usage of sg_last()
  Fix loop terminating conditions in fill_sg().
  [BLOCK] Clear sg entry before filling in blk_rq_map_sg()
  IA64: iommu uses sg_next with an invalid sg element
  cciss: disable DMA refetch on Smart Array P600
  swiotlb: fix map_sg failure handling
  SPARC64: fix iommu sg chaining
  [SCSI] ide-scsi: use scsi_sg_count() instead of ->use_sg

18 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
Linus Torvalds [Wed, 17 Oct 2007 16:05:55 +0000 (09:05 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (24 commits)
  [POWERPC] Fix vmemmap warning in init_64.c
  [POWERPC] Fix 64 bits vDSO DWARF info for CR register
  [POWERPC] Add 1TB workaround for PA6T
  [POWERPC] Enable NO_HZ and high res timers for pseries and ppc64 configs
  [POWERPC] Quieten cache information at boot
  [POWERPC] Quieten clockevent printk
  [POWERPC] Enable SLUB in *_defconfig
  [POWERPC] Fix 1TB segment detection
  [POWERPC] Fix iSeries_hpte_insert prototype
  [POWERPC] Fix copyright symbol
  [POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers
  [POWERPC] ibmebus: Add device creation and bus probing based on of_device
  [POWERPC] ibmebus: Remove bus match/probe/remove functions
  [POWERPC] Move of_device allocation into of_device.[ch]
  [POWERPC] mpc52xx: device tree changes for FEC and MDIO
  [POWERPC] bestcomm: GenBD task support
  [POWERPC] bestcomm: FEC task support
  [POWERPC] bestcomm: ATA task support
  [POWERPC] bestcomm: core bestcomm support for Freescale MPC5200
  [POWERPC] mpc52xx: Update mpc52xx_psc structure with B revision changes
  ...

18 years agoMerge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
Linus Torvalds [Wed, 17 Oct 2007 16:04:11 +0000 (09:04 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6

* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (59 commits)
  [XFS] eagerly remove vmap mappings to avoid upsetting Xen
  [XFS] simplify validata_fields
  [XFS] no longer using io_vnode, as was remaining from 23 cherrypick
  [XFS] Remove STATIC which was missing from prior manual merge
  [XFS] Put back the QUEUE_ORDERED_NONE test in the barrier check.
  [XFS] Turn off XBF_ASYNC flag before re-reading superblock.
  [XFS] avoid race in sync_inodes() that can fail to write out all dirty data
  [XFS] This fix prevents bulkstat from spinning in an infinite loop.
  [XFS] simplify xfs_create/mknod/symlink prototype
  [XFS] avoid xfs_getattr in XFS_IOC_FSGETXATTR ioctl
  [XFS] get_bulkall() could return incorrect inode state
  [XFS] Kill unused IOMAP_EOF flag
  [XFS] fix when DMAPI mount option processing happens
  [XFS] ensure file size is logged on synchronous writes
  [XFS] growlock should be a mutex
  [XFS] replace some large xfs_log_priv.h macros by proper functions
  [XFS] kill struct bhv_vfs
  [XFS] move syncing related members from struct bhv_vfs to struct xfs_mount
  [XFS] kill the vfs_flags member in struct bhv_vfs
  [XFS] kill the vfs_fsid and vfs_altfsid members in struct bhv_vfs
  ...

18 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux...
Linus Torvalds [Wed, 17 Oct 2007 16:00:30 +0000 (09:00 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup:
  Remove magic macros for screen_info structure members
  [x86] remove uses of magic macros for boot_params access

18 years agosecurity/ cleanups
Adrian Bunk [Wed, 17 Oct 2007 06:31:38 +0000 (23:31 -0700)]
security/ cleanups

This patch contains the following cleanups that are now possible:
- remove the unused security_operations->inode_xattr_getsuffix
- remove the no longer used security_operations->unregister_security
- remove some no longer required exit code
- remove a bunch of no longer used exports

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoImplement file posix capabilities
Serge E. Hallyn [Wed, 17 Oct 2007 06:31:36 +0000 (23:31 -0700)]
Implement file posix capabilities

Implement file posix capabilities.  This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.

This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php.  For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.

Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.

Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.

Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.

Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.

Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.

Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().

Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.

Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.

Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.

Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.

One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?

It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check.  But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.

     task_setscheduler
 note: this also controls cpuset:attach_task.  Are we ok with
     CAP_SYS_NICE being used to confine to a cpuset?
     task_setioprio
     task_setnice
 sys_setpriority uses this (through set_one_prio) for another
 process.  Need same checks as setrlimit

Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.

Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.

Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.

[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Andrew Morgan <morgan@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoifdef struct task_struct::security
Alexey Dobriyan [Wed, 17 Oct 2007 06:31:35 +0000 (23:31 -0700)]
ifdef struct task_struct::security

For those who don't care about CONFIG_SECURITY.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agosecurity: Convert LSM into a static interface
James Morris [Wed, 17 Oct 2007 06:31:32 +0000 (23:31 -0700)]
security: Convert LSM into a static interface

Convert LSM into a static interface, as the ability to unload a security
module is not required by in-tree users and potentially complicates the
overall security architecture.

Needlessly exported LSM symbols have been unexported, to help reduce API
abuse.

Parameters for the capability and root_plug modules are now specified
at boot.

The SECURITY_FRAMEWORK_VERSION macro has also been removed.

In a nutshell, there is no safe way to unload an LSM.  The modular interface
is thus unecessary and broken infrastructure.  It is used only by out-of-tree
modules, which are often binary-only, illegal, abusive of the API and
dangerous, e.g.  silently re-vectoring SELinux.

[akpm@linux-foundation.org: cleanups]
[akpm@linux-foundation.org: USB Kconfig fix]
[randy.dunlap@oracle.com: fix LSM kernel-doc]
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd section IDs to Documentation/DocBook/filesystems.tmpl
Rob Landley [Wed, 17 Oct 2007 06:31:31 +0000 (23:31 -0700)]
Add section IDs to Documentation/DocBook/filesystems.tmpl

Add recommended section IDs to Documentation/DocBook/filesystems.tmpl

Signed-off-by: Rob Landley <rob@landley.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoFix "make htmldocs" build break.
Rob Landley [Wed, 17 Oct 2007 06:31:31 +0000 (23:31 -0700)]
Fix "make htmldocs" build break.

Fix two htmldocs build breaks, introduced by moving include/linux/usb_gadget.h to
include/linux/usb/gadget.h and combining resume.c and suspend.c into main.c in
drivers/base/power.

Signed-off-by: Rob Landley <rob@landley.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd Documentation/RCU/00-Index
Rob Landley [Wed, 17 Oct 2007 06:31:30 +0000 (23:31 -0700)]
Add Documentation/RCU/00-Index

Add Documentation/RCU/00-INDEX

Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd recommended section IDs to deviceiobook.tmpl
Rob Landley [Wed, 17 Oct 2007 06:31:30 +0000 (23:31 -0700)]
Add recommended section IDs to deviceiobook.tmpl

Add recommended section ID tags to deviceiobook.tmpl

Because otherwise the link #anchors in the html vary from build to build.

Signed-off-by: Rob Landley <rob@landley.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoremap_file_pages: kernel-doc corrections
Randy Dunlap [Wed, 17 Oct 2007 06:31:29 +0000 (23:31 -0700)]
remap_file_pages: kernel-doc corrections

Fix kernel-doc for sys_remap_file_pages() and add info to the 'prot' NOTE.
Rename __prot parameter to prot.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoDocumentation/vm/slabinfo.c: clean up this code
WANG Cong [Wed, 17 Oct 2007 06:31:29 +0000 (23:31 -0700)]
Documentation/vm/slabinfo.c: clean up this code

This patch does the following cleanups for Documentation/vm/slabinfo.c:

- Fix two memory leaks;
- Constify some char pointers;
- Use snprintf instead of sprintf in case of buffer overflow;
- Fix some indentations;
- Other little improvements.

Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agovm.txt: document min_free_pages as critical for correctness
Pavel Machek [Wed, 17 Oct 2007 06:31:28 +0000 (23:31 -0700)]
vm.txt: document min_free_pages as critical for correctness

min_free_pages is critical for correctness, document it as such.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agokdump: documentation cleanups
Pavel Machek [Wed, 17 Oct 2007 06:31:28 +0000 (23:31 -0700)]
kdump: documentation cleanups

This cleans up kdump documentation a bit. Plus I do not think we want
to mention Linux trademark in _every_ file in documentation....

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoUpdate DMA-mapping documentation
Matthew Wilcox [Wed, 17 Oct 2007 06:31:27 +0000 (23:31 -0700)]
Update DMA-mapping documentation

A couple of updates haven't considered whether the documentation makes
sense as a whole any more.  Three changes here:

 - Remove the reference to the "DAC Addressing for Address Space Hungry
   Devices" section which was deleted by Jan Beulich.
 - Remove the comment about DMA_24BIT_MASK which became obsolete when
   Tobias Klauser changed the code to actually use DMA_24BIT_MASK.
 - Remove the section "64-bit DMA and DAC cycle support" since it's
   fully covered above, and contains a reference to the section deleted
   by Jan.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd Documentation/power/00-INDEX
Rob Landley [Wed, 17 Oct 2007 06:31:26 +0000 (23:31 -0700)]
Add Documentation/power/00-INDEX

Add Documentation/power/00-INDEX

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd entries to Documentation/powerpc
Rob Landley [Wed, 17 Oct 2007 06:31:26 +0000 (23:31 -0700)]
Add entries to Documentation/powerpc

Add two missing entries to Documentation/powerpc/00-INDEX

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd Documentation/{w1,w1/masters}/00-INDEX
Rob Landley [Wed, 17 Oct 2007 06:31:25 +0000 (23:31 -0700)]
Add Documentation/{w1,w1/masters}/00-INDEX

Two 00-INDEX files under Documentation/w1

Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd missing entries to top level Documentation/00-INDEX
Rob Landley [Wed, 17 Oct 2007 06:31:25 +0000 (23:31 -0700)]
Add missing entries to top level Documentation/00-INDEX

Add missing entries to Documentation/00-INDEX

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoTweak Documentation/SM501.txt
Rob Landley [Wed, 17 Oct 2007 06:31:24 +0000 (23:31 -0700)]
Tweak Documentation/SM501.txt

The existing Documentation/SM501.txt gives no clue what the chip is or does,
so copy the description from Kconfig help text.

Acked-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd reset_devices to the recommended parameters
Bernhard Walle [Wed, 17 Oct 2007 06:31:23 +0000 (23:31 -0700)]
Add reset_devices to the recommended parameters

This patch adds the "reset_devices" option (that's used only by one device
driver for now) to the recommended list of command line parameters for kdump.

Meaning (Documentation/kernel-parameters.txt):
    reset_devices   [KNL] Force drivers to reset the underlying device
                    during initialization.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoExpress new ELF32 mechanisms in documentation
Bernhard Walle [Wed, 17 Oct 2007 06:31:22 +0000 (23:31 -0700)]
Express new ELF32 mechanisms in documentation

This patch reflects the
http://git.kernel.org/?p=linux/kernel/git/horms/kexec-tools-testing.git;a=commit;h=b9c3648e690ad0dad12389659673206213a09760
change in kexec-tools-testing also now in the kernel documentation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoUpdate help text for CONFIG_CRASH_DUMP
Randy Dunlap [Wed, 17 Oct 2007 06:31:22 +0000 (23:31 -0700)]
Update help text for CONFIG_CRASH_DUMP

Fix typos in CONFIG_RELOCATABLE.  Use tab + 2 spaces for indentation on all
lines.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Bernhard Walle <bwalle@suse.de>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoExpress relocatability of kernel on x86_64 in documentation
Bernhard Walle [Wed, 17 Oct 2007 06:31:21 +0000 (23:31 -0700)]
Express relocatability of kernel on x86_64 in documentation

This patch adapts the Documentation/kdump/kdump.txt file to express the fact
that the x86_64 kernel is now also relocatable.  This makes i386 and x86_64
now behave the same, simplifying the documentation.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoDocumentation: delete unreferenced xterm-linux.xpm file
Robert P. J. Day [Wed, 17 Oct 2007 06:31:20 +0000 (23:31 -0700)]
Documentation: delete unreferenced xterm-linux.xpm file

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agokernel-doc: fix doc blocks and html
Randy Dunlap [Wed, 17 Oct 2007 06:31:20 +0000 (23:31 -0700)]
kernel-doc: fix doc blocks and html

Johannes Berg reports (Thanks!) that &struct names are not highlighted in
html output format when they are inside a DOC: block.

DOC: blocks were not escaped thru xml_escape() like other kernel-doc
comments were.  Fixed that.

However, that left a problem with <p> ($blankline_html) being processed
thru xml_escape(), converting it to &lt;p&gt;, which isn't good for the
generated html output (the <p> should remain unchanged), so this patch also
introduces the notion of "local" kernel-doc meta-characters
('\\\\mnemonic:'), which are converted to html just before writing the
stream to its output file.

Please report any problems that you (anyone) see in "highlighting" in any
output mode (text, man, html, xml).

Also update copyright to include me.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd a 00-INDEX file to Documentation/telephony/
Jesper Juhl [Wed, 17 Oct 2007 06:31:19 +0000 (23:31 -0700)]
Add a 00-INDEX file to Documentation/telephony/

Add a 00-INDEX file to Documentation/telephony/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Rob Landley <rob@landley.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd a 00-INDEX file to Documentation/sysctl/
Jesper Juhl [Wed, 17 Oct 2007 06:31:19 +0000 (23:31 -0700)]
Add a 00-INDEX file to Documentation/sysctl/

Add a 00-INDEX file to Documentation/sysctl/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: Rob Landley <rob@landley.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd a 00-INDEX file to Documentation/mips/
Jesper Juhl [Wed, 17 Oct 2007 06:31:18 +0000 (23:31 -0700)]
Add a 00-INDEX file to Documentation/mips/

Add a 00-INDEX file to Documentation/mips/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: Rob Landley <rob@landley.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agodoc: move vm/00-INDEX to Documentation/vm
David Rientjes [Wed, 17 Oct 2007 06:31:17 +0000 (23:31 -0700)]
doc: move vm/00-INDEX to Documentation/vm

Looks like the 00-INDEX file lost its parent directory in -rc6-mm1.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoAdd a missing 00-INDEX file for Documentation/vm/
Jesper Juhl [Wed, 17 Oct 2007 06:31:17 +0000 (23:31 -0700)]
Add a missing 00-INDEX file for Documentation/vm/

This patch adds a 00-INDEX file to Documentation/vm/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Rob Landley <rob@landley.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoDocumentation: add entries to filesystems/00-INDEX for several untracked files
Denis Cheng [Wed, 17 Oct 2007 06:31:16 +0000 (23:31 -0700)]
Documentation: add entries to filesystems/00-INDEX for several untracked files

Signed-off-by: Denis Cheng <crquan@gmail.com>
Cc: Rob Landley <rob@landley.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoDocumentation/make/headers_install.txt
Rob Landley [Wed, 17 Oct 2007 06:31:16 +0000 (23:31 -0700)]
Documentation/make/headers_install.txt

Some documentation for "make headers_install".

Signed-off-by: Rob Landley <rob@landley.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoClean up duplicate includes in Documentation/
Jesper Juhl [Wed, 17 Oct 2007 06:31:15 +0000 (23:31 -0700)]
Clean up duplicate includes in Documentation/

This patch cleans up duplicate includes in
Documentation/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agor/o bind mounts: create cleanup helper svc_msnfs()
Dave Hansen [Wed, 17 Oct 2007 06:31:15 +0000 (23:31 -0700)]
r/o bind mounts: create cleanup helper svc_msnfs()

I'm going to be modifying nfsd_rename() shortly to support read-only bind
mounts.  This #ifdef is around the area I'm patching, and it starts to get
really ugly if I just try to add my new code by itself.  Using this little
helper makes things a lot cleaner to use.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agor/o bind mounts: give permission() a local 'mnt' variable
Dave Hansen [Wed, 17 Oct 2007 06:31:14 +0000 (23:31 -0700)]
r/o bind mounts: give permission() a local 'mnt' variable

First of all, this makes the structure jumping look a little bit cleaner.  So,
this stands alone as a tiny cleanup.  But, we also need 'mnt' by itself a few
more times later in this series, so this isn't _just_ a cleanup.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agor/o bind mounts: rearrange may_open() to be r/o friendly
Dave Hansen [Wed, 17 Oct 2007 06:31:14 +0000 (23:31 -0700)]
r/o bind mounts: rearrange may_open() to be r/o friendly

may_open() calls vfs_permission() before it does checks for IS_RDONLY(inode).
It checks _again_ inside of vfs_permission().

The check inside of vfs_permission() is going away eventually.  With the
mnt_want/drop_write() functions, all of the r/o checks (except for this one)
are consistently done before calling permission().  Because of this, I'd like
to use permission() to hold a debugging check to make sure that the
mnt_want/drop_write() calls are actually being made.

So, to do this:
1. remove the IS_RDONLY() check from permission()
2. enforce that you must mnt_want_write() before
   even calling permission()
3. actually add the debugging check to permission()

We need to rearrange may_open() to do r/o checks before calling permission().
Here's the patch.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agor/o bind mounts: filesystem helpers for custom 'struct file's
Dave Hansen [Wed, 17 Oct 2007 06:31:13 +0000 (23:31 -0700)]
r/o bind mounts: filesystem helpers for custom 'struct file's

Why do we need r/o bind mounts?

This feature allows a read-only view into a read-write filesystem.  In the
process of doing that, it also provides infrastructure for keeping track of
the number of writers to any given mount.

This has a number of uses.  It allows chroots to have parts of filesystems
writable.  It will be useful for containers in the future because users may
have root inside a container, but should not be allowed to write to
somefilesystems.  This also replaces patches that vserver has had out of the
tree for several years.

It allows security enhancement by making sure that parts of your filesystem
read-only (such as when you don't trust your FTP server), when you don't want
to have entire new filesystems mounted, or when you want atime selectively
updated.  I've been using the following script to test that the feature is
working as desired.  It takes a directory and makes a regular bind and a r/o
bind mount of it.  It then performs some normal filesystem operations on the
three directories, including ones that are expected to fail, like creating a
file on the r/o mount.

This patch:

Some filesystems forego the vfs and may_open() and create their own 'struct
file's.

This patch creates a couple of helper functions which can be used by these
filesystems, and will provide a unified place which the r/o bind mount code
may patch.

Also, rename an existing, static-scope init_file() to a less generic name.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: add debug message for adding new device
Bjorn Helgaas [Wed, 17 Oct 2007 06:31:12 +0000 (23:31 -0700)]
PNP: add debug message for adding new device

Add PNP debug message when adding a device, remove similar PNPACPI message
with less information.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: simplify PNPBIOS insert_device
Bjorn Helgaas [Wed, 17 Oct 2007 06:31:11 +0000 (23:31 -0700)]
PNP: simplify PNPBIOS insert_device

Hoist the struct pnp_dev alloc up into the function where it's used.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: use dev_info() in system driver
Bjorn Helgaas [Wed, 17 Oct 2007 06:31:11 +0000 (23:31 -0700)]
PNP: use dev_info() in system driver

Use dev_info() for a little consistency.  Changes this:

    pnp: 00:01: ioport range 0xf50-0xf58 has been reserved
    pnp: 00:01: ioport range 0x408-0x40f has been reserved
    pnp: 00:01: ioport range 0x900-0x903 has been reserved

to this:

    system 00:01: ioport range 0xf50-0xf58 has been reserved
    system 00:01: ioport range 0x408-0x40f has been reserved
    system 00:01: ioport range 0x900-0x903 has been reserved

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: use dev_info(), dev_err(), etc in core
Bjorn Helgaas [Wed, 17 Oct 2007 06:31:10 +0000 (23:31 -0700)]
PNP: use dev_info(), dev_err(), etc in core

If we have the struct pnp_dev available, we can use dev_info(), dev_err(),
etc., to give a little more information and consistency.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: simplify PNP card error handling
Bjorn Helgaas [Wed, 17 Oct 2007 06:31:09 +0000 (23:31 -0700)]
PNP: simplify PNP card error handling

No functional change; just return errors early instead of putting the main
part of the function inside an "if" statement.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: remove null pointer checks
Bjorn Helgaas [Wed, 17 Oct 2007 06:31:09 +0000 (23:31 -0700)]
PNP: remove null pointer checks

Remove some null pointer checks.  Null pointers in these areas indicate
programming errors, and I think it's better to oops immediately rather than
return an error that is easily ignored.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoFix very high interrupt rate for IRQ8 (rtc) unless pnpacpi=off
Krzysztof Oledzki [Wed, 17 Oct 2007 06:31:08 +0000 (23:31 -0700)]
Fix very high interrupt rate for IRQ8 (rtc) unless pnpacpi=off

Workaround for broken systems with BIOS that makes RTC interrupt level
triggered and/or active low.

See http://bugzilla.kernel.org/show_bug.cgi?id=5243

Based on the patch from Shaohua Li <shaohua.li@intel.com>

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: don't fail device init if no DMA channel available
Jan Beulich [Wed, 17 Oct 2007 06:31:07 +0000 (23:31 -0700)]
PNP: don't fail device init if no DMA channel available

Most drivers for devices supporting ISA DMA can operate without DMA as well
(falling back zo PIO).  Thus it seems inappropriate for PNP to fail device
initialization in case none of the possible DMA channels are available.
Instead, it should be left to the driver to decide what to do if
request_dma() fails.

The patch at once adjusts the code to account for the fact that
pnp_assign_dma() now doesn't need to report failure anymore.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoPNP: make pnpacpi_suspend handle errors
Rafael J. Wysocki [Wed, 17 Oct 2007 06:31:06 +0000 (23:31 -0700)]
PNP: make pnpacpi_suspend handle errors

pnpacpi_suspend() doesn't check the result returned by
acpi_pm_device_sleep_state() before passing it to acpi_bus_set_power(),
which may not be desirable.  Â Make it select the target power state of the
device using its second argument if acpi_pm_device_sleep_state() fails.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Looks-ok-to: Pavel Machek <pavel@ucw.cz>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: clean up execute permission checking
Miklos Szeredi [Wed, 17 Oct 2007 06:31:06 +0000 (23:31 -0700)]
fuse: clean up execute permission checking

Define a new function fuse_refresh_attributes() that conditionally refreshes
the attributes based on the validity timeout.

In fuse_permission() only refresh the attributes for checking the execute bits
if necessary.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: no ENOENT from fuse device read
Miklos Szeredi [Wed, 17 Oct 2007 06:31:05 +0000 (23:31 -0700)]
fuse: no ENOENT from fuse device read

Don't return -ENOENT for a read() on the fuse device when the request was
aborted.  Instead return -ENODEV, meaning the filesystem has been
force-umounted or aborted.

Previously ENOENT meant that the request was interrupted, but now the
'aborted' flag is not set in case of interrupts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: no abort on interrupt
Miklos Szeredi [Wed, 17 Oct 2007 06:31:04 +0000 (23:31 -0700)]
fuse: no abort on interrupt

Don't set 'aborted' flag on a request if it's interrupted.  We have to wait
for the answer anyway, and this would only a very little time while copying
the reply.

This means, that write() on the fuse device will not return -ENOENT during
normal operation, only if the filesystem is aborted by a forced umount or
through the fusectl interface.

This could simplify userspace code somewhat when backward compatibility with
earlier kernel versions is not required.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: cleanup in release
Miklos Szeredi [Wed, 17 Oct 2007 06:31:04 +0000 (23:31 -0700)]
fuse: cleanup in release

Move dput/mntput pair from request_end() to fuse_release_end(), because
there's no other place they are used.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: fix permission checking on sticky directories
Miklos Szeredi [Wed, 17 Oct 2007 06:31:03 +0000 (23:31 -0700)]
fuse: fix permission checking on sticky directories

The VFS checks sticky bits on the parent directory even if the filesystem
defines it's own ->permission().  In some situations (sshfs, mountlo, etc) the
user does have permission to delete a file even if the attribute based
checking would not allow it.

So work around this by storing the permission bits separately and returning
them in stat(), but cutting the permission bits off from inode->i_mode.

This is slightly hackish, but it's probably not worth it to add new
infrastructure in VFS and a slight performance penalty for all filesystems,
just for the sake of fuse.

[Jan Engelhardt] cosmetic fixes
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: refresh stale attributes in fuse_permission()
Miklos Szeredi [Wed, 17 Oct 2007 06:31:02 +0000 (23:31 -0700)]
fuse: refresh stale attributes in fuse_permission()

fuse_permission() didn't refresh inode attributes before using them, even if
the validity has already expired.

Thanks to Junjiro Okajima for spotting this.

Also remove some old code to unconditionally refresh the attributes on the
root inode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: set i_nlink to sane value after mount
Miklos Szeredi [Wed, 17 Oct 2007 06:31:02 +0000 (23:31 -0700)]
fuse: set i_nlink to sane value after mount

Aufs seems to depend on a positive i_nlink value.  So fill in a dummy but sane
value for the root inode at mount time.

The inode attributes are refreshed with the correct values at the first
opportunity.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: fix page invalidation
Miklos Szeredi [Wed, 17 Oct 2007 06:31:01 +0000 (23:31 -0700)]
fuse: fix page invalidation

Other than truncate, there are two cases, when fuse tries to get rid
of cached pages:

 a) in open, if KEEP_CACHE flag is not set
 b) in getattr, if file size changed spontaneously

Until now invalidate_mapping_pages() were used, which didn't get rid
of mapped pages.  This is wrong, and becomes more wrong as dirty pages
are introduced.  So instead properly invalidate all pages with
invalidate_inode_pages2().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: truncate on spontaneous size change
Miklos Szeredi [Wed, 17 Oct 2007 06:31:01 +0000 (23:31 -0700)]
fuse: truncate on spontaneous size change

Memory mappings were only truncated on an explicit truncate, but not when the
file size was changed externally.

Fix this by moving the truncation code from fuse_setattr to
fuse_change_attributes.

Yes, there are races between write and and external truncation, but we can't
really do anything about them.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: add reference counting to fuse_file
Miklos Szeredi [Wed, 17 Oct 2007 06:31:00 +0000 (23:31 -0700)]
fuse: add reference counting to fuse_file

Make lifetime of 'struct fuse_file' independent from 'struct file' by adding a
reference counter and destructor.

This will enable asynchronous page writeback, where it cannot be guaranteed,
that the file is not released while a request with this file handle is being
served.

The actual RELEASE request is only sent when there are no more references to
the fuse_file.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: fix reserved request wake up
Miklos Szeredi [Wed, 17 Oct 2007 06:31:00 +0000 (23:31 -0700)]
fuse: fix reserved request wake up

Use wake_up_all instead of wake_up in put_reserved_req(), otherwise it is
possible that the right task is not woken up.

Also create a separate reserved_req_waitq in addition to the blocked_waitq,
since they fulfill totally separate functions.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofuse: update backing_dev_info congestion state
Miklos Szeredi [Wed, 17 Oct 2007 06:30:59 +0000 (23:30 -0700)]
fuse: update backing_dev_info congestion state

Set the read and write congestion state if the request queue is close to
blocking, and clear it when it's not.

This prevents unnecessary blocking in readahead and (when writable mmaps are
allowed) writeback.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofloppy: remove register keyword use from floppy driver
Jesper Juhl [Wed, 17 Oct 2007 06:30:58 +0000 (23:30 -0700)]
floppy: remove register keyword use from floppy driver

The floppy drive is slow.  These days I see absolutely no good reason why the
floppy driver should try to gain a tiny bit of speed by telling gcc to
optimize access to some variables via the register keyword.  Better to just
leave gcc free to do whatever optimizations it deduces to be sane and not
hamper it by telling it that some variables in the floppy driver are special
and need to be fast (they don't).

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofloppy: remove dead/commented out code from floppy driver
Jesper Juhl [Wed, 17 Oct 2007 06:30:58 +0000 (23:30 -0700)]
floppy: remove dead/commented out code from floppy driver

A good initial step for a cleanup seems to me to be getting rid of old dead
code.  This stuff is either commented out or inside '#if 0' so it is not
currently in use at all, let's just get rid of it once and for all.  That's a
few lines less to deal with.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agofloppy: do a very minimal style cleanup of the floppy driver
Jesper Juhl [Wed, 17 Oct 2007 06:30:57 +0000 (23:30 -0700)]
floppy: do a very minimal style cleanup of the floppy driver

Yes, some of this will likely be replaced in later patches, but I do not see
anyone else coming out of the woodwork with any patches for this driver, so
I'll ignore comments about churn.  I want to get this driver cleaned up, and
if I'm going to do so I want to start with this basic style cleanup to reduce
the reading pain a bit.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agomigration_call(CPU_DEAD): use spin_lock_irq() instead of task_rq_lock()
Oleg Nesterov [Wed, 17 Oct 2007 06:30:56 +0000 (23:30 -0700)]
migration_call(CPU_DEAD): use spin_lock_irq() instead of task_rq_lock()

Change migration_call(CPU_DEAD) to use direct spin_lock_irq() instead of
task_rq_lock(rq->idle), rq->idle can't change its task_rq().

This makes the code a bit more symmetrical with migrate_dead_tasks()'s path
which uses spin_lock_irq/spin_unlock_irq.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agodo CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(tasklist)
Oleg Nesterov [Wed, 17 Oct 2007 06:30:56 +0000 (23:30 -0700)]
do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(tasklist)

Currently move_task_off_dead_cpu() is called under
write_lock_irq(tasklist).  This means it can't use task_lock() which is
needed to improve migrating to take task's ->cpuset into account.

Change the code to call move_task_off_dead_cpu() with irqs enabled, and
change migrate_live_tasks() to use read_lock(tasklist).

This all is a preparation for the futher changes proposed by Cliff Wickman, see
http://marc.info/?t=117327786100003

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agomd: make sure read errors are auto-corrected during a 'check' resync in raid1
NeilBrown [Wed, 17 Oct 2007 06:30:55 +0000 (23:30 -0700)]
md: make sure read errors are auto-corrected during a 'check' resync in raid1

Whenever a read error is found, we should attempt to overwrite with correct
data to 'fix' it.

However when do a 'check' pass (which compares data blocks that are
successfully read, but doesn't normally overwrite) we don't do that.  We
should.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agomd: expose the degraded status of an assembled array through sysfs
Iustin Pop [Wed, 17 Oct 2007 06:30:54 +0000 (23:30 -0700)]
md: expose the degraded status of an assembled array through sysfs

The 'degraded' attribute is useful to quickly determine if the array is
degraded, instead of parsing 'mdadm -D' output or relying on the other
techniques (number of working devices against number of defined devices,
etc.).  The md code already keeps track of this attribute, so it's useful to
export it.

Signed-off-by: Iustin Pop <iusty@k1024.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agomd: 'sync_action' in sysfs returns wrong value for readonly arrays
NeilBrown [Wed, 17 Oct 2007 06:30:53 +0000 (23:30 -0700)]
md: 'sync_action' in sysfs returns wrong value for readonly arrays

When an array is started read-only, MD_RECOVERY_NEEDED can be set but no
recovery will be running.  This causes 'sync_action' to report the wrong
value.

We could remove the test for MD_RECOVERY_NEEDED, but doing so would leave a
small gap after requesting a sync action, where 'sync_action' would still
report the old value.

So make sure that for a read-only array, 'sync_action' always returns 'idle'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agomd: fix a bug in some never-used code.
NeilBrown [Wed, 17 Oct 2007 06:30:53 +0000 (23:30 -0700)]
md: fix a bug in some never-used code.

http://bugzilla.kernel.org/show_bug.cgi?id=3277

There is a seq_printf here that isn't being passed a 'seq'.  Howeve as the
code is inside #ifdef MD_DEBUG, nobody noticed.

Also remove some extra spaces.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agobitmap.h: remove dead artifacts
Adrian Bunk [Wed, 17 Oct 2007 06:30:52 +0000 (23:30 -0700)]
bitmap.h: remove dead artifacts

bitmap_active() no longer exists and BITMAP_ACTIVE is no longer used.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agomd: software Raid autodetect dev list not array
Michael J. Evans [Wed, 17 Oct 2007 06:30:52 +0000 (23:30 -0700)]
md: software Raid autodetect dev list not array

In current release kernels the md module (Software RAID) uses a static
array (dev_t[128]) to store partition/device info temporarily for
autostart.

I discovered this (and that the devices are added as disks/partitions are
discovered at boot) while I was debugging why only one of my MD arrays would
come up whole, while all the others were short a disk.

I eventually discovered that it was enumerating through all of 9 of my 11 hds
(2 had only 4 partitions apiece) while the other 9 have 15 partitions (I
wanted 64 per drive...).  The last partition of the 8th drive in my 9 drive
raid 5 sets wasn't added, thus making the final md array short both a parity
and data disk, and it was started later, elsewhere.

This patch replaces that static array with a list.

[akpm@linux-foundation.org: removed unused var]
Signed-off-by: Michael J. Evans <mjevans1983@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agoext2 reservations
Martin J. Bligh [Wed, 17 Oct 2007 06:30:46 +0000 (23:30 -0700)]
ext2 reservations

Val's cross-port of the ext3 reservations code into ext2.

[mbligh@mbligh.org: Small type error for printk
[akpm@linux-foundation.org: fix types, sync with ext3]
[mbligh@mbligh.org: Bring ext2 reservations code in line with latest ext3]
[akpm@linux-foundation.org: kill noisy printk]
[akpm@linux-foundation.org: remember to dirty the gdp's block]
[akpm@linux-foundation.org: cross-port the missed 5dea5176e5c32ef9f0d1a41d28427b3bf6881b3a]
[akpm@linux-foundation.org: cross-port e6022603b9aa7d61d20b392e69edcdbbc1789969]
[akpm@linux-foundation.org: Port the omitted 08fb306fe63d98eb86e3b16f4cc21816fa47f18e]
[akpm@linux-foundation.org: Backport the missed 20acaa18d0c002fec180956f87adeb3f11f635a6]
[akpm@linux-foundation.org: fixes]
[cmm@us.ibm.com: fix reservation extension]
[bunk@stusta.de: make ext2_get_blocks() static]
[hugh@veritas.com: fix hang]
[hugh@veritas.com: ext2_new_blocks should reset the reservation window size]
[hugh@veritas.com: ext2 balloc: fix off-by-one against rsv_end]
[hugh@veritas.com: grp_goal 0 is a genuine goal (unlike -1), so ext2_try_to_allocate_with_rsv should treat it as such]
[hugh@veritas.com: rbtree usage cleanup]
[pbadari@us.ibm.com: Fix for ext2 reservation]
[bunk@kernel.org: remove fs/ext2/balloc.c:reserve_blocks()]
[hugh@veritas.com: ext2 balloc: use io_error label]
Cc: "Martin J. Bligh" <mbligh@mbligh.org>
Cc: Valerie Henson <val_henson@linux.intel.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: remove unnecessary wait in throttle_vm_writeout()
Fengguang Wu [Wed, 17 Oct 2007 06:30:45 +0000 (23:30 -0700)]
writeback: remove unnecessary wait in throttle_vm_writeout()

We don't want to introduce pointless delays in throttle_vm_writeout() when
the writeback limits are not yet exceeded, do we?

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Greg KH <greg@kroah.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agointroduce I_SYNC
Joern Engel [Wed, 17 Oct 2007 06:30:44 +0000 (23:30 -0700)]
introduce I_SYNC

I_LOCK was used for several unrelated purposes, which caused deadlock
situations in certain filesystems as a side effect.  One of the purposes
now uses the new I_SYNC bit.

Also document the various bits and change their order from historical to
logical.

[bunk@stusta.de: make fs/inode.c:wake_up_inode() static]
Signed-off-by: Joern Engel <joern@wohnheim.fh-wedel.de>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: introduce writeback_control.more_io to indicate more io
Fengguang Wu [Wed, 17 Oct 2007 06:30:43 +0000 (23:30 -0700)]
writeback: introduce writeback_control.more_io to indicate more io

After making dirty a 100M file, the normal behavior is to start the writeback
for all data after 30s delays.  But sometimes the following happens instead:

- after 30s:    ~4M
- after 5s:     ~4M
- after 5s:     all remaining 92M

Some analyze shows that the internal io dispatch queues goes like this:

s_io            s_more_io
-------------------------
1) 100M,1K         0
2) 1K              96M
3) 0               96M

1) initial state with a 100M file and a 1K file
2) 4M written, nr_to_write <= 0, so write more
3) 1K written, nr_to_write > 0, no more writes(BUG)

nr_to_write > 0 in (3) fools the upper layer to think that data have all been
written out.  The big dirty file is actually still sitting in s_more_io.  We
cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and
let the loop in generic_sync_sb_inodes() continue: this may starve newly
expired inodes in s_dirty.  It is also not an option to draw inodes from both
s_more_io and s_dirty, an let the loop go on: this might lead to live locks,
and might also starve other superblocks in sync time(well kupdate may still
starve some superblocks, that's another bug).

We have to return when a full scan of s_io completes.  So nr_to_write > 0 does
not necessarily mean that "all data are written".  This patch introduces a
flag writeback_control.more_io to indicate this situation.  With it the big
dirty file no longer has to wait for the next kupdate invocation 5s later.

Cc: David Chinner <dgc@sgi.com>
Cc: Ken Chen <kenchen@google.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: remove pages_skipped accounting in __block_write_full_page()
Fengguang Wu [Wed, 17 Oct 2007 06:30:42 +0000 (23:30 -0700)]
writeback: remove pages_skipped accounting in __block_write_full_page()

Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:

> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.

It can be produced by the following test scheme:

# cat bin/test-writeback.sh
grep nr_dirty /proc/vmstat
echo 1 > /proc/sys/fs/inode_debug
dd if=/dev/zero of=/var/x bs=1K count=204800&
while true; do grep nr_dirty /proc/vmstat; sleep 1; done

# bin/test-writeback.sh
nr_dirty 19207
nr_dirty 19207
nr_dirty 30924
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
nr_dirty 47150
nr_dirty 47141
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47205
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47215
nr_dirty 47216
nr_dirty 47216
nr_dirty 47216
nr_dirty 47154
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47134
nr_dirty 47134
nr_dirty 47135
nr_dirty 47135
nr_dirty 47135
nr_dirty 46097 <== -1038
nr_dirty 46098
nr_dirty 46098
nr_dirty 46098
[...]
nr_dirty 46091
nr_dirty 46092
nr_dirty 46092
nr_dirty 45069 <== -1023
nr_dirty 45056
nr_dirty 45056
nr_dirty 45056
[...]
nr_dirty 37822
nr_dirty 36799 <== -1023
[...]
nr_dirty 36781
nr_dirty 35758 <== -1023
[...]
nr_dirty 34708
nr_dirty 33672 <== -1024
[...]
nr_dirty 33692
nr_dirty 32669 <== -1023

% ls -li /var/x
847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x

% dmesg|grep 847824  # generated by a debug printk
[  529.263184] redirtied inode 847824 line 548
[  564.250872] redirtied inode 847824 line 548
[  594.272797] redirtied inode 847824 line 548
[  629.231330] redirtied inode 847824 line 548
[  659.224674] redirtied inode 847824 line 548
[  689.219890] redirtied inode 847824 line 548
[  724.226655] redirtied inode 847824 line 548
[  759.198568] redirtied inode 847824 line 548

# line 548 in fs/fs-writeback.c:
543                 if (wbc->pages_skipped != pages_skipped) {
544                         /*
545                          * writeback is not making progress due to locked
546                          * buffers.  Skip this inode for now.
547                          */
548                         redirty_tail(inode);
549                 }

More debug efforts show that __block_write_full_page()
never has the chance to call submit_bh() for that big dirty file:
the buffer head is *clean*. So basicly no page io is issued by
__block_write_full_page(), hence pages_skipped goes up.

Also the comment in generic_sync_sb_inodes():

544                         /*
545                          * writeback is not making progress due to locked
546                          * buffers.  Skip this inode for now.
547                          */

and the comment in __block_write_full_page():

1713                 /*
1714                  * The page was marked dirty, but the buffers were
1715                  * clean.  Someone wrote them back by hand with
1716                  * ll_rw_block/submit_bh.  A rare case.
1717                  */

do not quite agree with each other. The page writeback should be skipped for
'locked buffer', but here it is 'clean buffer'!

This patch fixes this bug. Though I'm not sure why __block_write_full_page()
is called only to do nothing and who actually issued the writeback for us.

This is the two possible new behaviors after the patch:

1) pretty nice: wait 30s and write ALL:)
2) not so good:
- during the dd: ~16M
- after 30s:      ~4M
- after 5s:       ~4M
- after 5s:     ~176M

The next patch will fix case (2).

Cc: David Chinner <dgc@sgi.com>
Cc: Ken Chen <kenchen@google.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: fix ntfs with sb_has_dirty_inodes()
Fengguang Wu [Wed, 17 Oct 2007 06:30:39 +0000 (23:30 -0700)]
writeback: fix ntfs with sb_has_dirty_inodes()

NTFS's if-condition on dirty inodes is not complete.  Fix it with
sb_has_dirty_inodes().

Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Ken Chen <kenchen@google.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: fix time ordering of the per superblock inode lists 8
Fengguang Wu [Wed, 17 Oct 2007 06:30:39 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock inode lists 8

Streamline the management of dirty inode lists and fix time ordering bugs.

The writeback logic used to move not-yet-expired dirty inodes from s_dirty to
s_io, *only to* move them back.  The move-inodes-back-and-forth thing is a
mess, which is eliminated by this patch.

The new scheme is:
- s_dirty acts as a time ordered io delaying queue;
- s_io/s_more_io together acts as an io dispatching queue.

On kupdate writeback, we pull some inodes from s_dirty to s_io at the start of
every full scan of s_io.  Otherwise  (i.e. for sync/throttle/background
writeback), we always pull from s_dirty on each run (a partial scan).

Note that the line
list_splice_init(&sb->s_more_io, &sb->s_io);
is moved to queue_io() to leave s_io empty. Otherwise a big dirtied file will
sit in s_io for a long time, preventing new expired inodes to get in.

Cc: Ken Chen <kenchen@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: fix periodic superblock dirty inode flushing
Ken Chen [Wed, 17 Oct 2007 06:30:38 +0000 (23:30 -0700)]
writeback: fix periodic superblock dirty inode flushing

Current -mm tree has bucketful of bug fixes in periodic writeback path.
However, we still hit a glitch where dirty pages on a given inode aren't
completely flushed to the disk, and system will accumulate large amount of
dirty pages beyond what dirty_expire_interval is designed for.

The problem is __sync_single_inode() will move an inode to sb->s_dirty list
even when there are more pending dirty pages on that inode.  If there is
another inode with a small number of dirty pages, we hit a case where the loop
iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0.
Thus leaving the inode that has large amount of dirty pages behind and it has
to wait for another dirty_writeback_interval before we flush it again.  We
effectively only write out MAX_WRITEBACK_PAGES every dirty_writeback_interval.
If the rate of dirtying is sufficiently high, the system will start
accumulate a large number of dirty pages.

So fix it by having another sb->s_more_io list on which to park the inode
while we iterate through sb->s_io and to allow each dirty inode which resides
on that sb to have an equal chance of flushing some amount of dirty pages.

Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: fix time ordering of the per superblock dirty inode lists 7
Andrew Morton [Wed, 17 Oct 2007 06:30:37 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 7

This one fixes four bugs.

There are a few situation in there where writeback decides it is going to skip
over a blockdev inode on the kernel-internal blockdev superblock.  It
presently does this by moving the blockdev inode onto the tail of the blockdev
superblock's s_dirty.  But

a) this screws up s_dirty's reverse-time-orderedness and

b) refiling the blockdev for writeback in another 30 second is rude.  We
   should try again sooner than that.

Fix all this up by using redirty_head(): move the blockdev inode onto the head
of the blockdev superblock's s_dirty list for prompt writeback.

Cc: Mike Waychison <mikew@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: fix time ordering of the per superblock dirty inode lists 6
Andrew Morton [Wed, 17 Oct 2007 06:30:37 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 6

Recycling the previous changelog:

  When the writeback function is operating in writeback-for-flushing mode
  (as opposed to writeback-for-integrity) and it encounters an I_LOCKed inode,
  it will skip writing that inode.  This is done for throughput and latency:
  move on to another inode rather than blocking for this one.

  Writeback skips this inode by moving it off s_io and onto s_dirty, so that
  writeback can proceed with the other inodes on s_io.

  However that inode movement can corrupt s_dirty's
  reverse-time-orderedness.  Fix that by using the new redirty_tail(), which
  will update the refiled inode's dirtied_when field.

  Note: the behaviour in here is a bit rude: if kupdate happens to come
  across a locked inode then it will defer writeback of that inode for another
  30 seconds.  We'll address that in the next patch.

Address that here.  What we do is to move the skipped inode to the _head_ of
s_dirty, immediately eligible for writeout again.  Instead of deferring that
writeout for another 30 seconds.

One would think that this might cause a livelock: we keep on trying to write
the same locked inode.  But it won't because:

a) if that was the case, it would _already_ be happening on the
   balance_dirty_pages codepath.  Because balance_dirty_pages() doesn't care
   about inode timestamps.

b) if we skipped this inode then we won't have done any writeback.  The
   higher-level writeback paths will see that wbc.nr_to_write didn't change
   and they'll then back off and take a nap.

Cc: Mike Waychison <mikew@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: fix time ordering of the per superblock dirty inode lists 5
Andrew Morton [Wed, 17 Oct 2007 06:30:36 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 5

When the writeback function is operating in writeback-for-flushing mode (as
opposed to writeback-for-integrity) and it encounters an I_LOCKed inode, it
will skip writing that inode.  This is done for throughput and latency: move
on to another inode rather than blocking for this one.

Writeback skips this inode by moving it off s_io and onto s_dirty, so that
writeback can proceed with the other inodes on s_io.

However that inode movement can corrupt s_dirty's reverse-time-orderedness.
Fix that by using the new redirty_tail(), which will update the refiled
inode's dirtied_when field.

Note: the behaviour in here is a bit rude: if kupdate happens to come across a
locked inode then it will defer writeback of that inode for another 30
seconds.  We'll address that in the next patch.

Cc: Mike Waychison <mikew@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
18 years agowriteback: fix comment, use helper function
Andrew Morton [Wed, 17 Oct 2007 06:30:35 +0000 (23:30 -0700)]
writeback: fix comment, use helper function

There's a comment in there which claims that the inode is left on s_io
if nfs chickened out of writing some data.

But that's not been true for three years.
9290280ced13c85689adeffa587e9a53bd3a5873 fixed a livelock by moving these
inodes back onto s_dirty.  Fix the comment.

In the second leg of the `if', use redirty_tail() rather than open-coding it.

Add weaselly comment indicating lack of confidence in the code and lack of the
fortitude which would be needed to fiddle with it.

Cc: Mike Waychison <mikew@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>