Patch series "procfs: Add file path and size to /proc/<pid>/fdinfo", v2.
Processes can pin shared memory by keeping a handle to it through a
file descriptor; for instance dmabufs, memfd, and ashmem (in Android).
In the case of a memory leak, to identify the process pinning the
memory, userspace needs to:
- Iterate the /proc/<pid>/fd/* for each process
- Do a readlink on each entry to identify the type of memory from
the file path.
- stat() each entry to get the size of the memory.
The file permissions on /proc/<pid>/fd/* only allows for the owner
or root to perform the operations above; and so is not suitable for
capturing the system-wide state in a production environment.
This issue was addressed for dmabufs by making /proc/*/fdinfo/*
accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
To allow the same kind of tracking for other types of shared memory,
add the following fields to /proc/<pid>/fdinfo/<fd>:
path - This allows identifying the type of memory based on common
prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."
This was not an issued when dmabuf tracking was introduced
because the exp_name field of dmabuf fdinfo could be used
to distinguish dmabuf fds from other types.
size - To track the amount of memory that is being pinned.
dmabufs expose size as an additional field in fdinfo. Remove
this and make it a common field for all fds.
Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
-- the same as for /proc/<pid>/maps which also exposes the path and
size for mapped memory regions.
This allows for a system process with PTRACE_MODE_READ_FSCREDS to
account the pinned per-process memory via fdinfo.
This patch (of 2):
To be able to account the amount of memory a process is keeping pinned by
open file descriptors add a 'size' field to fdinfo output.
dmabufs fds already expose a 'size' field for this reason, remove this and
make it a common field for all fds. This allows tracking of other types
of memory (e.g. memfd and ashmem in Android).
Link: https://lkml.kernel.org/r/20220623220613.3014268-1-kaleshsingh@google.com
Link: https://lkml.kernel.org/r/20220623220613.3014268-2-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Ioannis Ilkos <ilkos@google.com>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>
Cc: Colin Cross <ccross@google.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
---------------------------------------------------------------
This file provides information associated with an opened file. The regular
-files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
+files have at least five fields -- 'pos', 'flags', 'mnt_id', 'ino', and 'size'.
+
The 'pos' represents the current offset of the opened file in decimal
form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
file has been created with [see open(2) for details] and 'mnt_id' represents
mount ID of the file system containing the opened file [see 3.5
/proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
-the file.
+the file, and 'size' represents the size of the file in bytes.
A typical output is::
flags: 0100002
mnt_id: 19
ino: 63107
+ size: 0
All locks associated with a file descriptor are shown in its fdinfo too::
flags: 04002
mnt_id: 9
ino: 63107
+ size: 0
eventfd-count: 5a
where 'eventfd-count' is hex value of a counter.
flags: 04002
mnt_id: 9
ino: 63107
+ size: 0
sigmask: 0000000000000200
where 'sigmask' is hex value of the signal mask associated
flags: 02
mnt_id: 9
ino: 63107
+ size: 0
tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7
where 'tfd' is a target file descriptor number in decimal form,
flags: 02000000
mnt_id: 9
ino: 63107
+ size: 0
inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
where 'wd' is a watch descriptor in decimal form, i.e. a target file
flags: 02
mnt_id: 9
ino: 63107
+ size: 0
fanotify flags:10 event-flags:0
fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
flags: 02
mnt_id: 9
ino: 63107
+ size: 0
clockid: 0
ticks: 0
settime flags: 01
{
struct dma_buf *dmabuf = file->private_data;
- seq_printf(m, "size:\t%zu\n", dmabuf->size);
/* Don't count the temporary reference taken inside procfs seq_show */
seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
if (ret)
return ret;
- seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
- (long long)file->f_pos, f_flags,
- real_mount(file->f_path.mnt)->mnt_id,
- file_inode(file)->i_ino);
+ seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
+ seq_printf(m, "flags:\t0%o\n", f_flags);
+ seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
+ seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
+ seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
/* show_fd_locks() never deferences files so a stale value is safe */
show_fd_locks(m, file, files);