IPC for Dummies
Understanding HelenOS IPC is essential for the development of HelenOS user space servers and services and, to a much lesser extent, for the development of any HelenOS user space code. This document attempts to concisely explain how to use the HelenOS IPC. It doesn't aspire to be exhaustive nor to cover the implementation details of the IPC subsystem itself. The original design motivations are explained in Chapter 8 of the HelenOS design documentation.
Introduction to the runtime environment
The HelenOS kernel maintains a hospitable environment for running instances of user programs called tasks. All tasks run in a separate address space so that one task cannot access another task's address space, but the kernel provides means of inter-task communication (IPC). In order to exploit the parallelism of today's processors, each task breaks down to one or more independently scheduled threads.
In user space, each thread executes by the means of a lightweight execution entities called fibrils. The distinction between threads and fibrils is that the kernel schedules threads and it is completely unaware of fibrils.
The standard library cooperatively schedules fibrils and lets them run on behalf of the underlying thread. Due to this cooperative way of scheduling, fibrils will run uninterrupted until completion unless:
- They explicitly yield the processor to another fibril
- They wait for an IPC reply that has not arrived yet
- They request an IPC operation which results in the underlying thread being blocked
- The underlying thread is preempted by the kernel
Fibrils were introduced especially to facilitate more straightforward IPC communication.
Basics of IPC communication
Because tasks are isolated from each other, they need to use the kernel's syscall interface for communication with the rest of the world. In the previous generation of microkernels, the emphasis was put on synchronous IPC communication. In HelenOS, both synchronous and asynchronous communication is possible, but the HelenOS IPC is primarily asynchronous.
The concept of and the terminology used in HelenOS IPC is based on the natural abstraction of a telephone dialogue between a human on one side of the connection and an answerbox on the other. The presence of a passive answerbox determines the asynchronous nature of the communication. Because of that, the call cannot be immediately answered, but needs to be first picked up from the answerbox by the second party.
In HelenOS, the IPC communication goes like in the following example. A user space fibril uses one of its phones, which is connected to the callee task's answerbox, and makes a short call. The caller fibril can either make another call or wait for the answer. The callee task has a missed call stored in its answerbox now. Sooner or later, one of the callee task's fibril will pick the call up, process it and either answer it or forward it to a third party's answerbox. Under all circumstances, the call will get eventually answered and the answer will be stored in the answerbox of the caller task.
Asynchronous framework
If a task is multithreaded or even if it has only one thread but several fibrils, the idea of connection is jeopardized. How can the task tell which of its fibrils should pick up the next call from the answerbox so that the right guy receives the right data? One approach would be to allow the first available fibril to pick it up, but then we could not talk about a connection and if we tried to preserve the concept of connection, the code handling incoming calls would most likely become full of state automata and callbacks. In HelenOS, there is a specialized piece of software called asynchronous framework, which forms a layer above the low-level IPC mechanism. The asynchronous framework does all the state automata and callback dirty work itself and hides the implementation details from the programmer.
The asynchronous framework is making an extensive use of fibrils and in fact it was the asynchronous framework which justified the existence of HelenOS fibrils. With the asynchronous framework in place, there are two kinds of fibrils:
- manager fibrils, and
- worker fibrils.
Manager fibrils pick up calls from answerboxes and according to their internal routing tables pass them to respective worker fibrils, which handle particular connections. If a worker fibril decides to wait for an answer which has not arrived yet, the asynchronous framework will register all necessary callbacks and switch to another runnable fibril. The framework will switch back to the original fibril only after the answer has arrived. If there are no runnable fibrils to switch to, the asynchronous framework will block the entire thread.
The benefit of using the asynchronous framework and fibrils is that the programmer can do without callbacks and state automata and still use asynchronous communication.
Features of HelenOS IPC
The features of HelenOS IPC can be summarized in the following list:
- short calls, consisting of one argument for method number and five arguments of payload,
- answers, consisting of one argument for return code and five arguments of payload,
- sending large data to another task,
- receiving large data from another task,
- sharing memory from another task,
- sharing memory to another task,
- interrupt notifications for user space device drivers.
The first two items can be considered basic building blocks.
Using short calls and answers, copying larger blocks of data and sharing memory between address spaces is possible (and also elegant) thanks to kernel monitoring the situation. The kernel basically snoops on the communication between the negotiating tasks and takes care of data transfers or memory sharing when the two parties agree on the data transfer or the memory sharing, respectively.
Connecting to another task
A HelenOS task can only communicate with another task to which it has an open phone. When created, each task has one open phone to start with. This initial phone is always connected to the naming service. The naming service is a system task at which other services register and which can connect clients to other registered services. The following snippet demonstrates how a task asks the naming service to connect it to the VFS server:
#include <async.h> ... /* * Use the naming service session than abstracts * the phone to the naming service. */ async_exch_t *exch = async_exchange_begin(ns_session); if (exch == NULL) { /* Handle error creating an exchange */ } async_sess_t *session = async_connect_me_to_iface(exch, INTERFACE_VFS, SERVICE_VFS, 0); async_exchange_end(exch); if (session == NULL) { /* Handle error connecting to the VFS */ }
The async_connect_me_to_iface is a wrapper for sending the IPC_M_CONNECT_ME_TO low-level IPC message to the naming service. The naming service simply forwards the IPC_M_CONNECT_ME_TO call to the destination service, provided that such service exists. Note that the service to which you intend connecting to will create a new fibril for handling the connection from your task. The newly created fibril in the destination task will receive the IPC_M_CONNECT_ME_TO call and will be given chance either to accept or reject the connection. In the snippet above, the client doesn't make use of the server-defined connection argument. If the connection is accepted, a new non-negative phone number will be returned to the client task and the asynchronous framework will create a new session for it. From that time on, the task can use that session for making calls to the service. The connection exists until either side closes it.
The client uses the async_hangup(async_sess_t *session) interface to close the connection.
Passing short IPC messages
On the lowest level, tasks communicate by making calls to other tasks to which they have an open phone. Each call is a data structure accommodating six native arguments (i.e. six 32-bit arguments on 32-bit systems or six 64-bit arguments on 64-bit systems). The first argument of the six will be interpreted as a method number for requests and return code for answers.
Method is either a system method or a protocol-defined method. System method numbers range from 0 to 1023, protocol-defined method numbers start at 1024. In the case of system methods, the payload arguments will have a predefined meaning and will be interpreted by the kernel. In the case of protocol-defined methods, the payload arguments will be defined by the protocol in question.
Even though a user space task can use the low-level IPC mechanisms directly, it is strongly discouraged (unless you know what you are doing) in favor of using the asynchronous framework. Making an asynchronous request via the asynchronous framework is fairly easy, as can be seen in the following example:
#include <async.h> ... async_exch_t *exch = async_exchange_begin(session); if (exch == NULL) { /* Handle error creating an exchange */ } ipc_call_t answer; aid_t req = async_send_3(exch, VFS_IN_OPEN, lflags, oflags, 0, &answer); async_exchange_end(exch); ... int rc; async_wait_for(req, &rc); if (rc != EOK) { /* Handle error from the server */ }
In the example above, the standard library is making an asynchronous call to the VFS server. The method number is VFS_IN_OPEN, and lflag, oflag and 0 are three payload arguments defined by the VFS protocol. Note that the number of arguments figures in the numeric suffix of the async_send_3() function name. There are analogous interfaces which take from zero to five payload arguments.
In this example, there are no payload return arguments except for the return value. If there were some return arguments of interest, the client could access them using IPC_GET_ARG1() through IPC_GET_ARG5() macros on the answer variable.
The advantage of the asynchronous call is that the client doesn't block during the send operation and can do some fruitful work before it starts to wait for the answer. If there is nothing to be done before sending the message and waiting for the answer, it is better to perform a synchronous call. Using the asynchronous framework, this is achieved in the following way:
#include <async.h> ... async_exch_t *exch = async_exchange_begin(session); if (exch == NULL) { /* Handle error creating an exchange */ } int rc = async_req_1_0(exch, VFS_IN_CLOSE, fildes); async_exchange_end(exch); if (rc != EOK) { /* Handle error from the server */ }
The example above illustrates how the standard library synchronously calls the VFS server and asks it to close a file descriptor passed in the fildes argument, which is the only payload argument defined for the VFS_IN_CLOSE method. The interface name encodes the number of input and return arguments in the function name, so there are variants that take or return different number of arguments. Note that contrary to the asynchronous example above, the return arguments would be stored directly to pointers passed to the function.
The interface for answering calls is async_answer_n(), where n is the number of return arguments. This is how the VFS server answers the VFS_IN_OPEN call:
async_answer_1(&call, EOK, fd);
In this example, call is the received call, EOK is the return value and fd is the only return argument.
Passing large data via IPC
Passing five words of payload in a request and five words of payload in an answer is not very suitable for larger data transfers. Instead, the application can use these building blocks to negotiate transfer of a much larger block (currently there is a hard limit on 64 KiB). The negotiation has three phases:
- the initial phase in which the client announces its intention to copy memory to or from the recipient,
- the receive phase in which the server learns about the bid, and
- the final phase in which the server either accepts or rejects the bid.
We use the terms client and server instead of the terms sender and recipient, because a client can be both the sender and the recipient and a server can be both the recipient and the sender, depending on the direction of the data transfer. In the following text, we'll cover both.
In theory, the programmer can use the low-level short IPC messages to implement all three phases himself or herself. However, this is can be tedious and error prone and therefore the standard library offers convenience wrappers for each phase instead.
Sending data
When sending data, the client is the sender and the server is the recipient. The following snippet illustrates the initial phase on the example of the libc open() call which transfers the path name to the VFS server. The initial phase is also the only step needed on the sender's side.
#include <async.h> ... char *pa; size_t pa_len; ... async_exch_t *exch = async_exchange_begin(session); if (exch == NULL) { /* Handle error creating an exchange */ } int rc = async_data_write_start(exch, pa, pa_len); async_exchange_end(exch); if (rc != EOK) { /* Error or the recipient denied the bid */ }
The pa and pa_len arguments specify the source address and the suggested number of bytes to transfer, respectively. The recipient will be able to determine the size parameter of the transfer in the receive phase:
#include <async.h> ... ipc_call_t call; size_t len; if (!async_data_write_receive(&call, &len)) { /* Protocol error - the sender is not sending data */ } /* Success, the receive phase is complete */
After the receive phase, the recipient will know - from the len variable - how many bytes is the sender willing to send. So far, no data is transfered. The separation of the receive and the final phase is important, because the recipient can get ready for the transfer (e.g. allocate the required amount of memory).
Now the recipient is on the cross-roads. It can do one of three things. It can answer the call with a non-zero return code, or it can accept and restrict the size of the transfer, or it can accept the transfer including the suggested size. The latter two options are achieved like this:
char *path; /* Allocate the receive buffer */ (void) async_data_write_finalize(&call, path, len);
After this call, the data transfer of len bytes to address path will be realized. The operation can theoretically fail, so you should check the return value of async_data_write_finalize(). If it is non-zero, then there was an error.
Accepting data
When accepting data, the client is the recipient and the server is the sender. The situation is similar to the previous one, the only difference is that the client specifies the destination address and the largest possible size for the transfer. The server can send less data than requested. In the following example, the read() function in the standard library is requesting nbyte worth of data to be read from a file system into the buf buffer:
#include <async.h> ... async_exch_t *exch = async_exchange_begin(session); if (exch == NULL) { /* Handle error creating an exchange */ } int rc = async_data_read_start(exch, buf, nbyte) async_exchange_end(exch); if (rc != EOK) { /* Error or the recipient denied the bid */ }
Now the file system, say it is TMPFS, receives the request like this:
#include <async.h> ... ipc_call_t call; size_t len; if (!async_data_read_receive(&call, &len)) { /* Protocol error - the sender is not accepting data */ } /* Success, the receive phase is complete */
After the receive phase is over, len is the maximum possible size of data the client is willing to accept. The sender can only restrict this value. Until the final phase is over, no data is transfered. The final phase follows:
(void) async_data_read_finalize(&call, dentry->data + pos, bytes);
Here the sender specifies the source address and the actual number of bytes to transfer. After the function call completes, the data has been transferred to the recipient. Note that the return value of async_data_read_finalize() is, maybe unjustly, ignored.
Sharing memory via IPC
In HelenOS, tasks can share memory only via IPC as the kernel does not provide dedicated system calls for memory sharing. Instead, the tasks negotiate much like in the case of passing large data. The negotiation has three phases and is very similar to the previous case:
- the initial phase in which the client announces its intention to share memory to or from the recipient,
- the receive phase in which the server learns about the bid, and
- the final phase in which the server either accepts or rejects the bid.
The semantics of the client and server also remains the same. Note that the direction of sharing is significant as well as it is significant during data copying.
Sharing address space area out
When sharing an address space area to other tasks, the client is the sender and the server is the recipient. The client offers one of its address space areas to the server for sharing. The following code snippet illustrates libblock's block_init() function offering a part of its address space starting at com_area to a block device associated with the dev_session session:
#include <async.h> ... async_exch_t *exch = async_exchange_begin(session); if (exch == NULL) { /* Handle error creating an exchange */ } int rc = async_share_out_start(exch, com_area, AS_AREA_READ | AS_AREA_WRITE); async_exchange_end(exch); if (rc != EOK) { /* Error or the recipient denied the bid */ }
This is how the RAM disk server receives the address space area offer made above:
#include <async.h> ... ipc_call_t call; size_t len; int flags; if (!async_share_out_receive(&call, &len, &flags)) { /* Protocol error - the sender is sharing out an area */ } /* Success, the receive phase is complete */
After the offer is received, the server has a chance to reject it by answering call with an error code distinct from EOK. The reason for denial can be an inappropriate len or non-suitable address space area flags in the flags variable. If the offer looks good to the server, it will accept it like this:
void *fs_va; (void) async_share_out_finalize(&call, fs_va);
Note that the return value of async_share_out_finalize() is maybe unjustly ignored here. The kernel will attempt to create the mapping only after the server calls async_share_out_finalize().
Sharing address space area in
When sharing memory from other tasks, the client is the recipient and the server is the sender. The client asks the server to provide an address space area. In the following example, the libfs library asks the VFS server to share the Path Lookup Buffer:
#include <async.h> ... async_exch_t *exch = async_exchange_begin(session); if (exch == NULL) { /* Handle error creating an exchange */ } fs_reg_t *reg; int rc = async_share_in_start(exch, reg->plb_ro, PLB_SIZE); async_exchange_end(exch); if (rc != EOK) { /* Error or the recipient denied the bid */ }
The VFS server learns about the request by performing the following code:
#include <async.h> ... ipc_call_t call; size_t size; if (!async_share_in_receive(&call, &size)) { /* Protocol error - the sender is not requesting a share */ } /* Success, the receive phase is complete */
The server now has a chance to react to the request. If size does not meet the server's requirements, the server will reject the offer. Otherwise the server will accept it. Note that so far, the address space area flags were not specified. That will happen in the final phase:
uint8_t *plb; (void) async_share_in_finalize(&call, plb, AS_AREA_READ | AS_AREA_CACHEABLE);
Again, the kernel will not create the mapping before the server completes the final phase of the negotiation via async_share_in_finalize().