File Locking in Linux 2.5

The file locking code in Linux 2.4 has a number of problems I'd like to address during 2.5 development. Here's a list: Here's a scheme which will hopefully address the above problems. Feedback welcome.

Providing the right facilities for networked/clustered filesystems

Note that this clears the way for filesystems to provide non-POSIX semantics (eg Netware, SMB OpLocks, etc). There is no requirement for any filesystem to use the local_lock() function.

lockd has an interesting problem. The semantics of fcntl(F_SETLKW) are that the process has to sleep until the lock is granted, or a signal interrupts the sleep. Clearly it's incredibly inefficient for lockd to spawn a new thread every time it wants to make a lock which would block. So at first glance, we need a different type of lock -- put the lock on the list of blocked locks, and return -EAGAIN (-EWOULDBLOCK?). Then, when that lock is held, notify lockd that it now has the lock, it can return that notification to the client and the client process unblocks.

But what if we simply replace the blocking lock with the would-block lock? That implies that the caller of ->lock() decides what to do with the -EWOULDBLOCK return code -- if it's fcntl(), it puts the process to sleep; if it's lockd, it just carries on.

A clustered filesystem might call out to the network and say `I want to put this lock on this file'. Either some other node in the cluster says `Denied', `Blocked' or `Granted' (ie handles the request), OR no other node accepts responsibility for the lock, in which case we lock it locally by calling local_lock().

lockd

lockd does the following to recover from a downed server:
for_each_lock
	if (belongs_to_my_fs)
		foo();
This requires it to have access to the global list of locks, which is a bad thing to have anyway.

I've written some replacement code which Trond approved of:

for_each_inode(sb)
	for_each_lock(inode)
		foo();
With the changes above, even this code can go away. The nfs client can keep a per-fs list of locks, and reestablish them at server restart. No need to interact with the local locking at all.

Non-POSIX locks

We already provide five different lock types: The proposal mentioned above would add a sixth -- whatever the filesystem supports. Ncpfs already does this through an ioctl, but that could be supported `natively' through this new scheme.

I want to add another byte-range lock, which looks and smells like a POSIX fcntl lock except that it is not removed by closing any fd which happens to be open on this file. Samba keeps a list of open fds which are not currently in use on any locked file to work around this stupidity in the spec. I'd like the external interface to this to be fcntl(F_SETLK_NP) and F_SETLKW_NP. Clearly F_GETLK does not need to be altered or replaced.

Restructuring

locks.c still runs almost entirely under the BKL. An earlier attempt to move it to a different locking scheme was thwarted when the code was integrated into 2.4.0-test9 while I was on holiday, and without me submitting it to Linus. Grumble. I plan to move it to _one_ spinlock to cover all lock-related structures, and I think that will be possible with the plan described above (since this code will no longer sleep).

As soon as lockd no longer needs to keep its fingers inside locks.c, I want to remove the global list of locks. It's also used by /proc/locks -- which probably needs to go away anyway. So what's useful about /proc/locks? I'd like to be able to see which locks my process has, and which processes have a lock on a given file. The former is easy -- /proc/$PID/locks can be constructed relatively easily from the fd's open by that process. The latter? I don't know. Ideas welcome.

Links

POSIX file locking
Olaf Kirch's page on NLM (warning: out of date)
Matthew Wilcox <matthew@wil.cx>
Last updated 2001-04-30