From: Linus Torvalds Date: Tue, 26 Nov 2019 19:34:06 +0000 (-0800) Subject: Revert "vfs: properly and reliably lock f_pos in fdget_pos()" X-Git-Tag: v5.5-rc1~155 X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=2be7d348fe924f0c5583c6a805bd42cecda93104;p=users%2Fdwmw2%2Flinux.git Revert "vfs: properly and reliably lock f_pos in fdget_pos()" This reverts commit 0be0ee71816b2b6725e2b4f32ad6726c9d729777. I was hoping it would be benign to switch over entirely to FMODE_STREAM, and we'd have just a couple of small fixups we'd need, but it looks like we're not quite there yet. While it worked fine on both my desktop and laptop, they are fairly similar in other respects, and run mostly the same loads. Kenneth Crudup reports that it seems to break both his vmware installation and the KDE upower service. In both cases apparently leading to timeouts due to waitinmg for the f_pos lock. There are a number of character devices in particular that definitely want stream-like behavior, but that currently don't get marked as streams, and as a result get the exclusion between concurrent read()/write() on the same file descriptor. Which doesn't work well for them. The most obvious example if this is /dev/console and /dev/tty, which use console_fops and tty_fops respectively (and ptmx_fops for the pty master side). It may be that it's just this that causes problems, but we clearly weren't ready yet. Because there's a number of other likely common cases that don't have llseek implementations and would seem to act as stream devices: /dev/fuse (fuse_dev_operations) /dev/mcelog (mce_chrdev_ops) /dev/mei0 (mei_fops) /dev/net/tun (tun_fops) /dev/nvme0 (nvme_dev_fops) /dev/tpm0 (tpm_fops) /proc/self/ns/mnt (ns_file_operations) /dev/snd/pcm* (snd_pcm_f_ops[]) and while some of these could be trivially automatically detected by the vfs layer when the character device is opened by just noticing that they have no read or write operations either, it often isn't that obvious. Some character devices most definitely do use the file position, even if they don't allow seeking: the firmware update code, for example, uses simple_read_from_buffer() that does use f_pos, but doesn't allow seeking back and forth. We'll revisit this when there's a better way to detect the problem and fix it (possibly with a coccinelle script to do more of the FMODE_STREAM annotations). Reported-by: Kenneth R. Crudup Cc: Kirill Smelkov Signed-off-by: Linus Torvalds --- diff --git a/fs/file.c b/fs/file.c index b241ea7f1aa46..3da91a112babe 100644 --- a/fs/file.c +++ b/fs/file.c @@ -795,7 +795,7 @@ unsigned long __fdget_pos(unsigned int fd) unsigned long v = __fdget(fd); struct file *file = (struct file *)(v & ~3); - if (file && !(file->f_mode & FMODE_STREAM)) { + if (file && (file->f_mode & FMODE_ATOMIC_POS)) { if (file_count(file) > 1) { v |= FDPUT_POS_UNLOCK; mutex_lock(&file->f_pos_lock); diff --git a/fs/open.c b/fs/open.c index 5c68282ea79e8..b62f5c0923a80 100644 --- a/fs/open.c +++ b/fs/open.c @@ -771,6 +771,10 @@ static int do_dentry_open(struct file *f, f->f_mode |= FMODE_WRITER; } + /* POSIX.1-2008/SUSv4 Section XSI 2.9.7 */ + if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) + f->f_mode |= FMODE_ATOMIC_POS; + f->f_op = fops_get(inode->i_fop); if (WARN_ON(!f->f_op)) { error = -ENODEV; @@ -1252,7 +1256,7 @@ EXPORT_SYMBOL(nonseekable_open); */ int stream_open(struct inode *inode, struct file *filp) { - filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE); + filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE | FMODE_ATOMIC_POS); filp->f_mode |= FMODE_STREAM; return 0; } diff --git a/include/linux/fs.h b/include/linux/fs.h index dde6dc4492a00..ae6c5c37f3ae2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -148,6 +148,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File is opened with O_PATH; almost nothing can be done with it */ #define FMODE_PATH ((__force fmode_t)0x4000) +/* File needs atomic accesses to f_pos */ +#define FMODE_ATOMIC_POS ((__force fmode_t)0x8000) /* Write access to underlying fs */ #define FMODE_WRITER ((__force fmode_t)0x10000) /* Has read method(s) */