Processes

A process holds the environment in which threads can run: it holds the memory mappings, the file descriptors, the user and group IDs, and more. The first process is the init process, which is created by the kernel during boot and has a PID of one. Thereafter, processes are created by duplication in an operation known as forking.

Creating a new process

The POSIX function to create a process is fork(2). It is an odd function because, for each successful call, there are two returns: one in the process that made the call, known as the parent, and one in the newly created process, known as the child as shown in the following diagram:

Creating a new process

Immediately after the call, the child is an exact copy of the parent, it has the same stack, the same heap, the same file descriptors, and executes the same line of code, the one following fork(2). The only way the programmer can tell them apart is by looking at the return value of fork: it is zero for the child and greater than zero for the parent. Actually, the value returned in the parent is the PID of the newly created child process. There is a third possibility, which is that the return is negative, meaning that the fork call failed and there is still only one process.

Although the two processes are initially identical, they are in separate address spaces. Changes made to a variable by one will not be seen by the other. Under the hood, the kernel does not make a physical copy of the parent's memory, which would be quite a slow operation and consume memory unnecessarily. Instead, the memory is shared but marked with a copy-on-write (CoW) flag. If either parent or child modifies this memory, the kernel first makes a copy and then writes to the copy. This has the benefit of an efficient fork function while retaining the logical separation of process address spaces. I will discuss CoW in Chapter 11, Managing Memory.

Terminating a process

A process may be stopped voluntarily by calling the exit(3) function or, involuntarily, by receiving a signal that is not handled. One signal in particular, SIGKILL, cannot be handled and so will always kill a process. In all cases, terminating the process will stop all threads, close all file descriptors, and release all memory. The system sends a signal, SIGCHLD, to the parent so that it knows this has happened.

Processes have a return value which is composed of either the argument to exit(3), if it terminated normally, or the signal number if it was killed. The chief use for this is in shell scripts: it allows you to test the return from a program. By convention, 0 indicates success and other values indicate a failure of some sort.

The parent can collect the return value with the wait(2) or waitpid(2) functions. This causes a problem: there will be a delay between a child terminating and its parent collecting the return value. In that period, the return value must be stored somewhere, and the PID number of the now dead process cannot be reused. A process in this state is a zombie, state Z in ps or top. So long as the parent calls wait(2) or waitpid(2), whenever it is notified of a child's termination (by means of the SIGCHLD signal, see Linux System Programming, by Robert Love, O'Reilly Media or The Linux Programming Interface, by Michael Kerrisk, No Starch Press for details of handling signals), zombies exist for too short a time to show up in process listings. They will become a problem if the parent fails to collect the return value because you will not be able to create any more processes.

Here is a simple example, showing process creation and termination:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void)
{
  int pid;
  int status;
  pid = fork();
  if (pid == 0) {
    printf("I am the child, PID %d
", getpid());
    sleep(10);
    exit(42);
  } else if (pid > 0) {
    printf("I am the parent, PID %d
", getpid());
    wait(&status);
    printf("Child terminated, status %d
",
    WEXITSTATUS(status));
  } else
    perror("fork:");
  return 0;
}

The wait(2) function blocks until a child process exits and stores the exit status. When you run it, you see something like this:

I am the parent, PID 13851
I am the child, PID 13852
Child terminated with status 42

The child process inherits most of the attributes of the parent, including the user and group IDs (UID and GID), all open file descriptors, signal handling, and scheduling characteristics.

Running a different program

The fork function creates a copy of a running program, but it does not run a different program. For that, you need one of the exec functions:

int execl(const char *path, const char *arg, ...);
int execlp(const char *file, const char *arg, ...);
int execle(const char *path, const char *arg,
           ..., char * const envp[]);
int execv(const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);
int execvpe(const char *file, char *const argv[],
           char *const envp[]);

Each takes a path to the program file to load and run. If the function succeeds, the kernel discards all the resources of the current process, including memory and file descriptors, and allocates memory to the new program being loaded. When the thread that called exec* returns, it returns not to the line of code after the call, but to the main() function of the new program. Here is an example of a command launcher: it prompts for a command, for example, /bin/ls, and forks and executes the string you enter:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(int argc, char *argv[])
{
  char command_str[128];
  int pid;
  int child_status;
  int wait_for = 1;
  while (1) {
    printf("sh> ");
    scanf("%s", command_str);
    pid = fork();
    if (pid == 0) {
      /* child */
      printf("cmd '%s'
", command_str);
      execl(command_str, command_str, (char *)NULL);
      /* We should not return from execl, so only get to this line if it failed */
      perror("exec");
      exit(1);
    }
    if (wait_for) {
      waitpid(pid, &child_status, 0);
      printf("Done, status %d
", child_status);
    }
  }
  return 0;
}

It might seem odd to have one function that duplicates an existing process and another that discards its resources and loads a different program into memory, especially since it is common for a fork to be followed almost immediately by exec. Most operating systems combine the two actions into a single call.

There are distinct advantages, however. For example, it makes it very easy to implement redirection and pipes in the shell. Imagine that you want to get a directory listing, this is the sequence of events:

  1. You type ls at the shell prompt.
  2. The shell forks a copy of itself.
  3. The child execs /bin/ls.
  4. The ls program prints the directory listing to stdout (file descriptor 1) which is attached to the terminal. You see the directory listing.
  5. The ls program terminates and the shell regains control.

Now, imagine that you want the directory listing to be written to a file by redirecting the output using the > character. The sequence is now as follows:

  1. You type ls > listing.txt.
  2. The shell forks a copy of itself.
  3. The child opens and truncates the file listing.txt, and uses dup2(2) to copy the file descriptor of the file over file descriptor 1 (stdout).
  4. The child execs /bin/ls.
  5. The program prints the listing as before, but this time it is writing to listing.txt.
  6. The ls program terminates and the shell regains control.

Note that there is an opportunity at step three to modify the environment of the child process before executing the program. The ls program does not need to know that it is writing to a file rather than a terminal. Instead of a file, stdout could be connected to a pipe and so the ls program, still unchanged, can send output to another program. This is part of the Unix philosophy of combining many small components that each do a job well, as described in The Art of Unix Programming, by Eric Steven Raymond, Addison Wesley; (23 Sept. 2003) ISBN 978-0131429017, especially in the section Pipes, Redirection, and Filters.

Daemons

We have encountered daemons in several places already. A daemon is a process that runs in the background, owned by the init process, PID1, and not connected to a controlling terminal. The steps to create a daemon are as follows:

  1. Call fork() to create a new process, after which the parent should exit, thus creating an orphan which will be re-parented to init.
  2. The child process calls setsid(2), creating a new session and process group of which it is the sole member. The exact details do not matter here, you can simply consider this as a way of isolating the process from any controlling terminal.
  3. Change the working directory to the root.
  4. Close all file descriptors and redirect stdin, stdout, and sterr (descriptors 0, 1, and 2) to /dev/null so that there is no input and all output is hidden.

Thankfully, all of the preceding steps can be achieved with a single function call, daemon(3).

Inter-process communication

Each process is an island of memory. You can pass information from one to another in two ways. Firstly, you can copy it from one address space to the other. Secondly, you can create an area of memory that both can access and so share the data.

The first is usually combined with a queue or buffer so that there is a sequence of messages passing between processes. This implies copying the message twice: first to a holding area and then to the destination. Some examples of this are sockets, pipes, and POSIX message queues.

The second way requires not only a method of creating memory that is mapped into two (or more) address spaces at once, but also a means of synchronizing access to that memory, for example, by using semaphores or mutexes. POSIX has functions for all of these.

There is an older set of APIs known as System V IPC, which provides message queues, shared memory, and semaphores, but it is not as flexible as the POSIX equivalents so I will not describe it here. The man page on svipc(7) gives an overview of the facilities and there is more detail in The Linux Programming Interface, by Michael Kerrisk, No Starch Press and Unix Network Programming, Volume 2, by W. Richard Stevens.

Message-based protocols are usually easier to program and debug than shared memory, but are slow if the messages are large.

Message-based IPC

There are several options which I will summarize as follows. The attributes that differentiate between them are:

  • Whether the message flow is uni- or bi-directorial.
  • Whether the data flow is a byte stream, with no message boundary, or discrete messages with boundaries preserved. In the latter case, the maximum size of a message is important.
  • Whether messages are tagged with a priority.

The following table summarizes these properties for FIFOs, sockets, and message queues:

Property

FIFO

Unix socket: stream

Unix socket: datagram

POSIX message queue

Message boundary

Byte stream

Byte stream

Discrete

Discrete

Uni/bi-directional

Uni

Bi

Uni

Uni

Max message size

Unlimited

Unlimited

In the range 100 KiB to 250 KiB

Default: 8 KiB, absolute maximum: 1 MiB

Priority levels

None

None

None

0 to 32767

Unix (or local) sockets

Unix sockets fulfill most requirements and, coupled with the familiarity of the sockets API, they are by far the most common mechanism.

Unix sockets are created with the address family AF_UNIX and bound to a path name. Access to the socket is determined by the access permission of the socket file. As with Internet sockets, the socket type can be SOCK_STREAM or SOCK_DGRAM, the former giving a bi-directional byte stream, and the latter providing discrete messages with preserved boundaries. Unix socket datagrams are reliable, meaning that they will not be dropped or reordered. The maximum size for a datagram is system-dependent and is available via /proc/sys/net/core/wmem_max. It is typically 100 KiB or more.

Unix sockets do not have a mechanism for indicating the priority of a message.

FIFOs and named pipes

FIFO and named pipe are just different terms for the same thing. They are an extension of the anonymous pipe that is used to communicate between parent and child and are used to implement piping in the shell.

A FIFO is a special sort of file, created by the command mkfifo(1). As with Unix sockets, the file access permissions determine who can read and write. They are uni-directional, meaning that there is one reader and usually one writer, though there may be several. The data is a pure byte stream but with a guarantee of atomicity of messages that are smaller than the buffer associated with the pipe. In other words, writes less than this size will not be split into several smaller writes and so the reader will read the whole message in one go, so long as the size of the buffer at the reader end is large enough. The default size of the FIFO buffer is 64 KiB on modern kernels and can be increased using fcntl(2) with F_SETPIPE_SZ up to the value in /proc/sys/fs/pipe-max-size, typically 1 MiB.

There is no concept of priority.

POSIX message queues

Message queues are identified by a name, which must begin with a forward slash / and contain only one / character: message queues are actually kept in a pseudo filesystem of the type mqueue. You create a queue and get a reference to an existing queue through mq_open(3), which returns a file. Each message has a priority and messages are read from the queue in priority and then age order. Messages can be up to /proc/sys/kernel/msgmax bytes long. The default value is 8 KiB, but you can set it to be any size in the range 128 bytes to 1 MiB by writing the value to /proc/sys/kernel/msgmax bytes. Each message has a priority. They are read from the queue in priority then age order. Since the reference is a file descriptor, you can use select(2), poll(2), and other similar functions to wait for activity on the queue.

See the Linux man page mq_overview(7).

Summary of message-based IPC

Unix sockets are the most often used because they offer all that is needed, except perhaps message priority. They are implemented on most operating systems, and so they confer maximum portability.

FIFOs are less used, mostly because they lack an equivalent to a datagram. On the other hand, the API is very simple, being the normal open(2), close(2), read(2), and write(2) file calls.

Message queues are the least commonly used of this group. The code paths in the kernel are not optimized in the way that socket (network) and FIFO (filesystem) calls are.

There are also higher level abstractions, in particular dbus, which are moving from mainstream Linux into embedded devices. Dbus uses Unix sockets and shared memory under the surface.

Shared memory-based IPC

Sharing memory removes the need for copying data between address spaces but introduces the problem of synchronizing accesses to it. Synchronization between processes is commonly achieved using semaphores.

POSIX shared memory

To share memory between processes, you first have to create a new area of memory and then map it into the address space of each process that wants access to it, as in the following diagram:

POSIX shared memory

POSIX shared memory follows the pattern we encountered with message queues. The segments are identified by names that begin with a / character and have exactly one such character. The function shm_open(3) takes the name and returns a file descriptor for it. If it does not exist already and the O_CREAT flag is set, then a new segment is created. Initially it has a size of zero. Use the (misleadingly named) ftruncate(2) to expand it to the desired size.

Once you have a descriptor for the shared memory, you map it into the address space of the process using mmap(2), and so threads in different processes can access the memory.

Here is an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>  /* For mode constants */
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>
#include <semaphore.h>
#define SHM_SEGMENT_SIZE 65536
#define SHM_SEGMENT_NAME "/demo-shm"
#define SEMA_NAME "/demo-sem"

static sem_t *demo_sem;
/*
 * If the shared memory segment does not exist already, create it
 * Returns a pointer to the segment or NULL if there is an error
 */

static void *get_shared_memory(void)
{
  int shm_fd;
  struct shared_data *shm_p;
  /* Attempt to create the shared memory segment */
  shm_fd = shm_open(SHM_SEGMENT_NAME, O_CREAT | O_EXCL | O_RDWR, 0666);

  if (shm_fd > 0) {
    /* succeeded: expand it to the desired size (Note: dont't do "this every time because ftruncate fills it with zeros) */
    printf ("Creating shared memory and setting size=%d
",
    SHM_SEGMENT_SIZE);

    if (ftruncate(shm_fd, SHM_SEGMENT_SIZE) < 0) {
      perror("ftruncate");
      exit(1);
    }
    /* Create a semaphore as well */
    demo_sem = sem_open(SEMA_NAME, O_RDWR | O_CREAT, 0666, 1);

    if (demo_sem == SEM_FAILED)
      perror("sem_open failed
");
  }
  else if (shm_fd == -1 && errno == EEXIST) {
    /* Already exists: open again without O_CREAT */
    shm_fd = shm_open(SHM_SEGMENT_NAME, O_RDWR, 0);
    demo_sem = sem_open(SEMA_NAME, O_RDWR);

    if (demo_sem == SEM_FAILED)
      perror("sem_open failed
");
  }

  if (shm_fd == -1) {
    perror("shm_open " SHM_SEGMENT_NAME);
    exit(1);
  }
  /* Map the shared memory */
  shm_p = mmap(NULL, SHM_SEGMENT_SIZE, PROT_READ | PROT_WRITE,
    MAP_SHARED, shm_fd, 0);

  if (shm_p == NULL) {
    perror("mmap");
    exit(1);
  }
  return shm_p;
}
int main(int argc, char *argv[])
{
  char *shm_p;
  printf("%s PID=%d
", argv[0], getpid());
  shm_p = get_shared_memory();

  while (1) {
    printf("Press enter to see the current contents of shm
");
    getchar();
    sem_wait(demo_sem);
    printf("%s
", shm_p);
    /* Write our signature to the shared memory */
    sprintf(shm_p, "Hello from process %d
", getpid());
    sem_post(demo_sem);
  }
  return 0;
}

The memory in Linux is taken from a tmpfs filesystem mounted in /dev/shm or /run/shm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.17.27