-
-configfs - Userspace-driven kernel object configuration.
+=======================================================
+Configfs - Userspace-driven Kernel Object Configuration
+=======================================================
Joel Becker <joel.becker@oracle.com>
Joel Becker <joel.becker@oracle.com>
-[What is configfs?]
+What is configfs?
+=================
configfs is a ram-based filesystem that provides the converse of
sysfs's functionality. Where sysfs is a filesystem-based view of
Both sysfs and configfs can and should exist together on the same
system. One is not a replacement for the other.
-[Using configfs]
+Using configfs
+==============
configfs can be compiled as a module or into the kernel. You can access
-it by doing
+it by doing::
mount -t configfs none /config
There are two types of configfs attributes:
* Normal attributes, which similar to sysfs attributes, are small ASCII text
-files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
-only one value per file should be used, and the same caveats from sysfs apply.
-Configfs expects write(2) to store the entire buffer at once. When writing to
-normal configfs attributes, userspace processes should first read the entire
-file, modify the portions they wish to change, and then write the entire
-buffer back.
+ files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
+ only one value per file should be used, and the same caveats from sysfs apply.
+ Configfs expects write(2) to store the entire buffer at once. When writing to
+ normal configfs attributes, userspace processes should first read the entire
+ file, modify the portions they wish to change, and then write the entire
+ buffer back.
* Binary attributes, which are somewhat similar to sysfs binary attributes,
-but with a few slight changes to semantics. The PAGE_SIZE limitation does not
-apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
-The write(2) calls from user space are buffered, and the attributes'
-write_bin_attribute method will be invoked on the final close, therefore it is
-imperative for user-space to check the return code of close(2) in order to
-verify that the operation finished successfully.
-To avoid a malicious user OOMing the kernel, there's a per-binary attribute
-maximum buffer value.
+ but with a few slight changes to semantics. The PAGE_SIZE limitation does not
+ apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
+ The write(2) calls from user space are buffered, and the attributes'
+ write_bin_attribute method will be invoked on the final close, therefore it is
+ imperative for user-space to check the return code of close(2) in order to
+ verify that the operation finished successfully.
+ To avoid a malicious user OOMing the kernel, there's a per-binary attribute
+ maximum buffer value.
When an item needs to be destroyed, remove it with rmdir(2). An
item cannot be destroyed if any other item has a link to it (via
symlink(2)). Links can be removed via unlink(2).
-[Configuring FakeNBD: an Example]
+Configuring FakeNBD: an Example
+===============================
Imagine there's a Network Block Device (NBD) driver that allows you to
access remote block devices. Call it FakeNBD. FakeNBD uses configfs
the driver about it. Here's where configfs comes in.
When the FakeNBD driver is loaded, it registers itself with configfs.
-readdir(3) sees this just fine:
+readdir(3) sees this just fine::
# ls /config
fakenbd
A fakenbd connection can be created with mkdir(2). The name is
arbitrary, but likely the tool will make some use of the name. Perhaps
-it is a uuid or a disk name:
+it is a uuid or a disk name::
# mkdir /config/fakenbd/disk1
# ls /config/fakenbd/disk1
The target attribute contains the IP address of the server FakeNBD will
connect to. The device attribute is the device on the server.
Predictably, the rw attribute determines whether the connection is
-read-only or read-write.
+read-only or read-write::
# echo 10.0.0.1 > /config/fakenbd/disk1/target
# echo /dev/sda1 > /config/fakenbd/disk1/device
That's it. That's all there is. Now the device is configured, via the
shell no less.
-[Coding With configfs]
+Coding With configfs
+====================
Every object in configfs is a config_item. A config_item reflects an
object in the subsystem. It has attributes that match values on that
subsystem is also a config_group, and can do everything a config_group
can.
-[struct config_item]
+struct config_item
+==================
+
+::
struct config_item {
char *ci_name;
Usually a subsystem wants the item to display and/or store attributes,
among other things. For that, it needs a type.
-[struct config_item_type]
+struct config_item_type
+=======================
+
+::
struct configfs_item_operations {
void (*release)(struct config_item *);
method. This method is called when the config_item's reference count
reaches zero.
-[struct configfs_attribute]
+struct configfs_attribute
+=========================
+
+::
struct configfs_attribute {
char *ca_name;
attribute is writable and provides a ->store method, that method will be
be called whenever userspace asks for a write(2) on the attribute.
-[struct configfs_bin_attribute]
+struct configfs_bin_attribute
+=============================
+
+::
struct configfs_bin_attribute {
struct configfs_attribute cb_attr;
single read/write will occur; the attributes' need not concern itself
with it.
-[struct config_group]
+struct config_group
+===================
A config_item cannot live in a vacuum. The only way one can be created
is via mkdir(2) on a config_group. This will trigger creation of a
-child item.
+child item::
struct config_group {
struct config_item cg_item;
that item means that a group can behave as an item in its own right.
However, it can do more: it can create child items or groups. This is
accomplished via the group operations specified on the group's
-config_item_type.
+config_item_type::
struct configfs_group_operations {
struct config_item *(*make_item)(struct config_group *group,
};
A group creates child items by providing the
-ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new
+ct_group_ops->make_item() method. If provided, this method is called from
+mkdir(2) in the group's directory. The subsystem allocates a new
config_item (or more likely, its container structure), initializes it,
and returns it to configfs. Configfs will then populate the filesystem
tree to reflect the new item.
the ct_group_ops->drop_item() method, and configfs will call
config_item_put() on the item on behalf of the subsystem.
-IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2)
-is called, configfs WILL remove the item from the filesystem tree
-(assuming that it has no children to keep it busy). The subsystem is
-responsible for responding to this. If the subsystem has references to
-the item in other threads, the memory is safe. It may take some time
-for the item to actually disappear from the subsystem's usage. But it
-is gone from configfs.
+Important:
+ drop_item() is void, and as such cannot fail. When rmdir(2)
+ is called, configfs WILL remove the item from the filesystem tree
+ (assuming that it has no children to keep it busy). The subsystem is
+ responsible for responding to this. If the subsystem has references to
+ the item in other threads, the memory is safe. It may take some time
+ for the item to actually disappear from the subsystem's usage. But it
+ is gone from configfs.
When drop_item() is called, the item's linkage has already been torn
down. It no longer has a reference on its parent and has no place in
called, as the item has not been dropped. rmdir(2) will fail, as the
directory is not empty.
-[struct configfs_subsystem]
+struct configfs_subsystem
+=========================
A subsystem must register itself, usually at module_init time. This
-tells configfs to make the subsystem appear in the file tree.
+tells configfs to make the subsystem appear in the file tree::
struct configfs_subsystem {
struct config_group su_group;
int configfs_register_subsystem(struct configfs_subsystem *subsys);
void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
- A subsystem consists of a toplevel config_group and a mutex.
+A subsystem consists of a toplevel config_group and a mutex.
The group is where child config_items are created. For a subsystem,
this group is usually defined statically. Before calling
configfs_register_subsystem(), the subsystem must have initialized the
group via the usual group _init() functions, and it must also have
initialized the mutex.
- When the register call returns, the subsystem is live, and it
+
+When the register call returns, the subsystem is live, and it
will be visible via configfs. At that point, mkdir(2) can be called and
the subsystem must be ready for it.
-[An Example]
+An Example
+==========
The best example of these basic concepts is the simple_children
subsystem/group and the simple_child item in
and storing an attribute, and a simple group creating and destroying
these children.
-[Hierarchy Navigation and the Subsystem Mutex]
+Hierarchy Navigation and the Subsystem Mutex
+============================================
There is an extra bonus that configfs provides. The config_groups and
config_items are arranged in a hierarchy due to the fact that they
a subsystem to trust ci_parent and cg_children while they hold the
mutex.
-[Item Aggregation Via symlink(2)]
+Item Aggregation Via symlink(2)
+===============================
configfs provides a simple group via the group->item parent/child
relationship. Often, however, a larger environment requires aggregation
can it be removed while an item links to it. Dangling symlinks are not
allowed in configfs.
-[Automatically Created Subgroups]
+Automatically Created Subgroups
+===============================
A new config_group may want to have two types of child config_items.
While this could be codified by magic names in ->make_item(), it is much
rmdir(2). They also are not considered when rmdir(2) on the parent
group is checking for children.
-[Dependent Subsystems]
+Dependent Subsystems
+====================
Sometimes other drivers depend on particular configfs items. For
example, ocfs2 mounts depend on a heartbeat region item. If that
If it fails, it was being torn down anyway, and heartbeat can gracefully
pass up an error.
-[Committable Items]
+Committable Items
+=================
-NOTE: Committable items are currently unimplemented.
+Note:
+ Committable items are currently unimplemented.
Some config_items cannot have a valid initial state. That is, no
default values can be specified for the item's attributes such that the
shutdown, or "uncommitted". Again, this is done via rename(2), this
time from the "live" directory back to the "pending" one. The subsystem
is notified by the ct_group_ops->uncommit_object() method.
-
-