wiki:FSDesign

Implementation and design of the file system layer

Unlike in monolithic designs, the file system functionality is spread across several components in HelenOS:

Standard library support

The standard library translates more or less POSIX file system requests made by the user application to the VFS server frontend protocol and passes them to VFS. The library emulates some calls such as opendir(), readdir(), rewinddir() and closedir() using other calls. In this case these will be open(), read(), lseek() and close().

The VFS server accepts only absolute file paths and so the standard library takes care of providing the getcwd() and chdir() interfaces. It also translates all relative paths to absolute. Passing absolute paths may not be always optimal, but it simplifies the design of the VFS server and the libfs algorithms. In addition, thanks to this feature, the dot-dot file path components can be processed lexically, which leads to further simplifications.

The standard library forwards all other requests, which it is unable to handle itslef, to the VFS server and does not contribute to the file system functionality by any other means. Each file system request forwarded to VFS is composed of one or more IPC phone calls.

VFS server

The VFS server is the focal point and also the most complex element of the file system support in the HelenOS operating system. It exists as a standalone user task. We talk about the VFS frontend and VFS backend.

VFS frontend

The frontend is responsible for accepting requests from the client tasks. For each client, VFS spawns a dedicated connection fibril which handles the connection. Arguments of the incoming requests are either absolute file paths, file handles of already opened files, and in some special cases also VFS triplets (see below). Regardless of their type, the arguments typically reference some file and, as we will see later, the frontend always converts this reference to an internal representation called VFS node.

Each request understood by the frontend has a symbolic name, which starts with the VFS_IN prefix.

Paths as Arguments

If the argument is a file path, VFS uses the vfs_lookup_internal() function to translate the path into the so called lookup result represented by the vfs_lookup_res_t type. The lookup result predominantly contains a VFS triplet, which is an ordered triplet containing a global handle of the file system instance, a global device handle and a file index. Thus a VFS triplet uniquely identifies a file on some file system instance. An example VFS triplet could look like this:

        (2, 1, 10)

In the above example, the VFS triplet describes a file on a file system which was assigned number 2 by the VFS service, located on a device, which was assigned number 1 by the DEVMAP service, and which has a file-system specific index number 10. The last number is also known as i-node number in other operating systems.

VFS keeps information about each referenced file in an abstraction called VFS node, for which there is the vfs_node_t type. Thus, a VFS node represents some file which is referenced by VFS. VFS nodes are the first class entities in the VFS server, because for most operations it needs to have the VFS node.

The VFS server calls the vfs_node_get() function in order to get a VFS node for the corresponding lookup result. This function creates a new VFS node or adds a reference to an existing one. VFS nodes are organized in a hash table with the VFS triplet as a search key.

The following example illustrates how the VFS server obtains the VFS node in the implementation of the unlink operation:

        int rc;
        int lflag = ...;
        char *path = ...;    /* file path */
        ...
        vfs_lookup_res_t lr;
        rc = vfs_lookup_internal(path, lflag | L_UNLINK, &lr, NULL);
        if (rc != EOK) {
             /* handle error */
             ...
        }
   
        vfs_node_t *node = vfs_node_get(&lr);
        /* now we have a reference to the node and work with it */
        ...
        vfs_node_put(node);

The example is simplified and does not show all the details (e.g. it omits all synchronization), but it shows the main idea. Note the trailing vfs_node_put() function which drops a reference to a VFS node. If the last reference is dropped from a node, vfs_node_put() removes it from the hash table and cleans it up.

Handles as Arguments

The VFS server understands file handles and can accept them as arguments for VFS requests made by the client. Each client is using its private set of file handles to refer to its open files. VFS maintains each client's open files in a table of open files, which is local to the servicing connection fibril. The table is composed of vfs_file_t pointers and the file handles index it. The associated connection fibril does not need to synchronize accesses to the table of open files because it is its exclusive owner.

The vfs_file_t structures track things like how many file handles reference it, the current position in the open file and the corresponding VFS node. The transition from a file handle to a VFS node is therefore straightforward and is best shown on the following example:

        int fd;     /* file handle */
...
        /* Lookup the file structure corresponding to the file descriptor. */
        vfs_file_t *file = vfs_file_get(fd);
...
        /*
         * Lock the open file structure so that no other thread can manipulate
         * the same open file at a time.
         */
        fibril_mutex_lock(&file->lock);
...
        /*
         * Lock the file's node so that no other client can read/write to it at
         * the same time.
         */
        if (read)
                fibril_rwlock_read_lock(&file->node->contents_rwlock);
        else
                fibril_rwlock_write_lock(&file->node->contents_rwlock);

In the above code snippet, the vfs_rdwr() function first translates the file handle using the vfs_file_get() interface to a vfs_file_t structure and then locks the result. The VFS node is directly accessed in the two RW-lock lock operations at the end of the example.

VFS backend

As soon as the VFS server knows the VFS node associated with the request, it either asks one of the endpoint file system servers to carry out the operation for it or, when it has enough information, it completes the operation itself. For example, VFS handles the VFS_IN_SEEK request, which corresponds to the POSIX call lseek(), entirely on its own, because it just manipulates the current position pointer within the respective vfs_file_t structure. In the worst case, when seeking to the end of the file, VFS needs to know the size of the file, but this is not a problem, because the server maintains the current file size in each VFS node.

We refer to the part which communicates with the endpoint file system servers as to VFS backend. VFS backend knows the handle of the endpoint file system (and also of the underlying device) from the VFS node, so it can use it to obtain an IPC phone and communicate with it. The set of calls that VFS can make to an endpoint file system server defines the VFS output protocol because all potential endpoint file system servers need to understand it and implement it in some way.

The symbolic names of requests in the VFS output protocol are prefixed with VFS_OUT.

PLB and canonical file paths

VFS and the endpoint file system servers cooperate in resolving file system paths to VFS triplets. Roughly speaking, VFS consults the file systems mounted along the given path. Each of them resolves maximum of the yet unresolved portion of the path until it either reaches a mount point or the end of the path. Eventually, the last file system server will manage to resolve the path and reply to the VFS server by sending the resulting VFS triplet. One of the design goals of the HelenOS file system layer is to avoid the situation in which a path or its portion would be repeatedly copied back and forth between VFS and each endpoint file system server. In order to meet this design criteria, VFS allocates and maintains a ring buffer in which it stores all looked-up paths. Owing to its use, the buffer is called Pathname Lookup Buffer, or PLB, and each endpoint file system server shares it read-only with VFS. The paths are placed into the buffer by the above mentioned function vfs_lookup_internal().

To maximally ease the process of path resolution, the PLB is expected to contain only paths that are in the canonical form, which can be defined as follows:

  1. the path is absolute (i.e. a/b/c is not canonical)
  1. there is no trailing slash in the path if it has components (i.e. /a/b/c/ is not canonical)
  1. there is no extra slash in the path (i.e. /ab/c is not canonical)
  1. there is no dot component in the path (i.e. /a/./b/c is not canonical)
  1. there is no 'dot-dot' component in the path (i.e. /a/b/../c is not canonical)

The standard library contains the canonify() function, which checks whether a path is canonical and possibly converts a non-canonical path to a canonical one.

In a more detailed view, the path translation starts by vfs_lookup_internal() storing a canonical path into the PLB. VFS then contacts the file system server which is mounted under the file system root and sends it the VFS_OUT_LOOKUP request along with the indices of the first and last characters of the path in the PLB. After the root file system resolves its part of the path it does not necessarily reply back to VFS. If there is still a portion of the path to be resolved, it forwards the VFS_OUT_LOOKUP request to the file system which is mounted under the mount point where the resolution stopped. At the same time, it modifies the argument of the forwarded call, which contains the PLB index of the path's first character, to index the first character of the yet unresolved portion of the path. The resolution continues in the same spirit until one of the file system servers reaches the end of the path. This file system will complete the path resolution by specifying the VFS triplet of the resulting node in an answer to the VFS_OUT_LOOKUP request. The answer will go directly to the originator of the request, which is the VFS server.

Endpoint file system servers

As mentioned above, each endpoint file system server needs to implement the VFS output protocol. Through the polymorphism this offers, HelenOS currently supports the following file system types (and we believe that more can be added):

TMPFS
A custom memory based file system without an on-disk format and permanent storage.
FAT16
A well known, non-Unix like file system with simple on-disk format.
DEVFS
A custom pseudo file system for representing devices in the file system.

Especially the servers for file systems with permanent storage, such as FAT16, need to communicate with the underlying block device. Therefore, there needs to be a mechanism to connect the endpoint file system server with the block device. This mechanism is provided by the DEVMAP server which registers all the device driver servers and lets their clients to connect to them using the device handle. This is how the endpoint file system server establishes a connection to the underlying block device via the libblock library, which itself uses the standard library's interface to DEVMAP:

        #include <ipc/devmap.h>
...
        devmap_handle_t devmap_handle;
...
        int dev_phone = devmap_device_connect(devmap_handle, IPC_FLAG_BLOCKING);
        if (dev_phone < 0) {
                /* handle error */
        }

In most cases, the endpoint file system server will make use of the libfs library so it will have to implement the libfs operations as described below. In case of special and very simple file systems, such as DEVFS, the file system server may decide to do without libfs.

Last modified 14 years ago Last modified on 2010-11-18T18:18:45Z
Note: See TracWiki for help on using the wiki.