Chapter 4. Primitive Communications

Primitive Communications

Introduction

Now that we have covered the basics of process structure and generation, we can begin to address the topic of interprocess communications. It is common for processes to need to coordinate their activities (e.g., such as when accessing a non-shareable system resource). Conceptually, this coordination is implemented via some form of passive or active communication between processes. As we will see, there are a number of ways in which interprocess communications can be carried out. The remaining chapters address a variety of interprocess communication techniques. As the techniques become more sophisticated, they become more complex, and hopefully more flexible and reliable. We begin by discussing primitive communication techniques that, while they get the job done, have certain limitations.

Lock Files

A lock file (which should not be confused with file/record locking, an I/O technique covered in Section 4.3) can be used by processes as a way to communicate with one another. The processes involved may be different programs or multiple instances of the same program. The use of lock files has a long history in UNIX. Early versions of UNIX (as well as some current versions) use lock files as a means of communication. Lock files are sometimes found in line printer and uucp implementations. In some systems the coordination of access to password and mail files also rely on lock files and/or the locking of a specific file.

The theory behind the use of a lock file as an interprocess communication technique is rudimentary. In brief, by using an agreed-upon file-naming convention, a process examines a prearranged location for the presence or absence of a lock file. Often the location is a temporary directory (e.g., /tmp) where the files are automatically cleared when the system reboots (or by periodic housecleaning by the system administrator) and where all users normally have read/write/execute permission. In its most basic form, if the file is present, the process takes one set of actions, and if the file is missing, it takes another. For example, suppose we have two processes, Process_One and Process_Two, that seek access to a single non-shareable resource (e.g., a printer or disk). A lock file-based communication convention for the two processes could be as shown in Figure 4.1.

Using a lock file for communication with two processes.

Figure 4.1. Using a lock file for communication with two processes.

It is clear that communication implemented in this manner only conveys a minimal amount of information from one process to another. In essence, the processes are using the presence or absence of the lock file as a binary semaphore. The file's presence or absence communicates, from one process to another, the availability of a resource.

Such a communication technique is fraught with problems. The most apparent problem is that the processes must agree upon the naming convention for the lock file. However, additional, perhaps unforeseen, problems may arise as well. For example,

  1. What if one of the processes fails to remove the lock file when it is finished with the resource?

  2. Polling (the constant checking to determine if a certain event has occurred) is expensive (CPU-wise) and is to be avoided. How does the process that does not obtain access to the resource wait for the resource to become free?

  3. Race conditions whereby both processes find the lock file absent at the same time and, thus, both attempt to simultaneously create it should not happen. Can we make the generation of the lock file atomic (non-divisible, i.e., non-interruptible)?

As we will see, we will be able to address some of these concerns and others we will only be able to limit in scope. A program that implements communications using a lock file is presented below. The code for the main portion of the program is shown in Program 4.1.

Example 4.1. Using a lock file—the main program.

File : p4.1.cxx
  |     /*
  |          Using a lock file as a process communication technique.
  |     */
  |     #include <iostream>
  +     #include <unistd.h>
  |
  |     #include "lock_file.h"        <-- 1
  |     using namespace std;
  |     int
 10     main(int argc, char *argv[ ]){
  |       int  numb_tries, i = 5;
  |       int  sleep_time;
  |       char *fname;
  |       /*
  +             Assign values from the command line
  |       */
  |       set_defaults(argc, argv, &numb_tries, &sleep_time, &fname);
  |       /*
  |             Attempt to obtain lock file
 20       */
  |       if (acquire(numb_tries, sleep_time, fname)) {
  |         while (i--) {                     // simulate resource use
  |           cout << getpid( )<< " " << i << endl;
  |           sleep(sleep_time);
  +         }
  |         release(fname);                   // remove lock file
  |         return 0;
  |       } else
  |         cerr << getpid( ) << " unable to obtain lock file after "
 30              << numb_tries << " tries." << endl;
  |       return 1;
  |     }
  • (1)This header resides locally.

At line 7 of the program, the local header file lock_file.h is included. This file (Figure 4.2) contains the prototypes for the three functions set_defaults, acquire, and release, that are used to manipulate the lock file. Preprocessor statements are used in the header file to prevent the file from being inadvertently included more than once.

In line 17 of the main program the set_defaults function is called to establish the default values. Once these values have been assigned, the program attempts to obtain the lock file by calling the function acquire (line 21). If the program is successful in creating the lock file, it then accesses the non-shareable resource. In the case of Program 4.1 the resource involved is the screen. When access to the screen is acquired, the program displays a series of integer values. Once the program is finished with the resource (all values have been displayed), the lock file is removed using the release function.

Example 4.2. The lock_file.h header file.

File : lock_file.h
  |     #ifndef LOCK_FILE_H
  |     #define LOCK_FILE_H
  |     /*
  |        Lock file function prototypes
  +     */
  |     void  set_defaults(int, char *[], int *, int *, char **);
  |     bool  acquire(int, int, char *);
  |     bool  release(char *);
  |     #endif

The set_defaults function accepts five arguments. The first two arguments (an integer and an array of character pointers) are the argc and argv values passed to the main program (Program 4.1). As written, the program will allow the user to change some or all of the default values by passing alternate values on the command line when the program is invoked. The remaining three arguments for set_defaults are the number of tries to be made when attempting to generate the lock file, the amount of time to wait in seconds between attempts, and a reference to the name of the lock file.

The acquire function takes three arguments. The first is the number of times to attempt to create the lock file, the second the sleep interval between tries, and the third a reference to the lock file name. The acquire function returns a boolean value indicating its success.

The function release removes the lock file. This function is passed a reference to the lock file and returns a boolean value indicating whether or not it was successful. The code for these functions, which are stored in a separate file, is shown in Figure 4.3.

Example 4.3. Source code for the set_defaults, acquire, and release functions.

File : lock_file.cxx
  |     /*
  |       Source code for using lock file. Compile using -c and
  |       -D_GNU_SOURCE options. Link object code as needed.
  |     */
  +     #include <iostream>
  |     #include <cstring>
  |     #include <cstdlib>
  |     #include <cerrno>
  |     #include <limits.h>
 10     #include <fcntl.h>
  |     #include <unistd.h>
  |     const int  NTRIES = 5;                  // default values
  |     const int  SLEEP  = 5;
  |     const char *LFILE = "/tmp/TEST.LCK";
  +     using namespace std;
  |     void
  |     set_defaults(int ac, char *av[ ],
  |                  int *n_tries, int *s_time, char **f_name){
  |       static char full_name[PATH_MAX];
 20       *n_tries = NTRIES;                    // Start with defaults
  |       *s_time  = SLEEP;
  |       strcpy(full_name, LFILE);
  |       switch (ac) {
  |       case 4:                               // File  name was specified
  +         full_name[0] = '';                // "clear" the string
  |         strcpy(full_name, av[3]);           // Add the passed in file
  |       case 3:
  |         if ((*s_time = atoi(av[2])) <= 0)   //  Seconds of sleep time
  |           *s_time = SLEEP;
 30       case 2:
  |         if ((*n_tries = atoi(av[1])) <= 0)  // Number of times to try
  |           *n_tries = NTRIES;
  |       case 1:                               // Use the defaults
  |         break;
  +       default:
  |         cerr << "Usage: " << av[0] <<
  |                 " [[tries][sleep][lockfile]]" << endl;
  |         exit(1);
  |       }
 40       *f_name = full_name;
  |     }
  |
  |     bool
  |     acquire(int numb_tries, int sleep_time, char *file_name){
  +       int   fd, count = 0;
  |       while ((fd = creat(file_name, 0)) == -1 && errno == EACCES)
  |         if (++count < numb_tries)           // If still more tries
  |           sleep(sleep_time);                // sleep for a while
  |         else
 50           return (false);                   // Unable to generate
  |       close(fd);                            // Close (0 byte in size)
  |       return (bool(fd != -1));              // OK if actually done
  |     }
  |
  +     bool
  |     release(char *file_name){
  |       return bool(unlink(file_name) == 0);
  |     }

At the top of the lock_file.cxx file, the default values are assigned. The set_defaults function examines the number of arguments passed on the command line (which has been passed to it as the variable ac). A cascading switch statement is used to determine if changes in the default assignments should be made. The set_defaults function assumes the command-line arguments, if present, are arranged as

linux$ program_name  numb_of_tries  sec_to_sleep  lck_file_name

The value for numb_of_tries and the sec_to_sleep should be nonzero. The lck_file_name is the name to be used for the lock file. As written, the set_defaults function does not validate the passed-in lock file location/name but does attempt to disallow values of zero or less for the number of tries and the sleep interval.

The function acquire relies on the system call creat (note there is no trailing e) to generate the lock file (Table 4.1).

Table 4.1. Summary of the creat System Call.

Include File(s)

<sys/types.h>
<sys/stat.h>
<fcntl.h>

Manual Section

2

Summary

int creat(const char *pathname,mode_t mode);

Return

Success

Failure

Sets errno

Lowest available integer file descriptor

−1

Yes

By definition, creat is used to create a new file or rewrite a file that already exists (first truncating it to 0 bytes). The creat system call will open a file for writing only.

creat requires two arguments. The first argument, pathname, is a character pointer to the file to be created, and the second argument, mode, is a value of type mode_t (in most cases defined as type int in the <sys/types.h> file), which specifies the mode (access permissions) for the created file. The header file <fcntl.h> contains a number of predefined constants that may be bitwise ORed to specify the mode for the file. The creat system call in the program function acquire creates a file whose access mode is 0. If creat is successful, the file generated will not have read, write, or execute permission for any user groups (this excludes the superuser root).[1]

An alternate approach to creating the file would be to use the open[2] system call. The equivalent statement using open would be:

open( path, O_WRONLY | O_CREAT | O_TRUNC, 0 );

If the creat call is successful, it will return an integer value that is the lowest available file descriptor. If creat fails, it returns/sets a −1 and sets errno. Table 4.2 contains the errors that may be encountered when using the creat system call.

As shown, a number of things can cause creat to fail, including too many files open, an incorrectly specified file and/or path name, and so on. The failure we test for in the while loop of the acquire function is EACCES.[3] The failure of creat and the setting of errno to EACCES indicates the file to be created already exists and write permission to the file is denied (remember, the file was generated with a mode of 0).

Table 4.2. creat Error Messages.

#

Constant

perror Message

Explanation

2

ENOENT

No such file or directory

One or more parts of the path to new file do not exist (or is NULL).

6

ENXIO

No such device or address

O_NONBLOCK | O_WRONLY is set, the named file is a pipe, and no process has the file open for reading.

12

ENOMEM

Cannot allocate memory

Insufficient kernel memory was available.

13

EACCES

Permission denied

  • The requested access to the file is not allowed.

  • Search permission denied on part of file path.

  • File does not exist.

14

EFAULT

Bad address

pathname references an illegal address space.

17

EEXIST

File exists

pathname (file) already exists and O_CREAT and O_EXCL were specified.

19

ENODEV

No such device

pathname refers to a device special file, and no corresponding device exists.

20

ENOTDIR

Not a directory

Part of the specified path is not a directory.

21

EISDIR

Is a directory

pathname refers to a directory, and the access requested involved writing.

23

ENFILE

Too many open files in system

System limit on open files has been reached.

24

EMFILE

Too many open files

The process has exceeded the maximum number of files open.

26

ETXTBSY

Text file busy

More than one process has the executable open for writing.

28

ENOSPC

No space left on device

Device for pathname has no space for new file (it is out of inodes).

30

EROFS

Read-only file system

The pathname refers to a file on a read-only filesystem, and write access was requested.

36

ENAMETOOLONG

File name too long

The pathname value exceeds system path/file name length.

40

ELOOP

Too many levels of symbolic links

The perror message says it all.

As noted, the while loop in the acquire function tests to determine if a file can be created. If the file can be created, the loop is exited and the file descriptor is closed (leaving the file present and 0 bytes in length). When the file cannot be created and the error code in errno is EACCES, the if statement in the body of the loop is executed. In the if statement the value for count is tested against the designated number of tries for creating the file. If insufficient tries have been made, a call to sleep, to suspend processing, is made.

sleep is a library function that suspends the invoking process for the number of seconds indicated by its argument seconds.[4] See Table 4.3. If sleep is interrupted (such as by a signal), the number of unslept seconds is returned. If the amount of time slept is equal to the argument value passed, sleep will return a 0. Using sleep in the polling loop to have the process wait is a compromise. It is not an elegant way to reduce CPU-intensive code but, at this point, is better than no built-in wait or running some sort of throwaway calculation loop. In later chapters, we discuss alternate solutions to this problem.

Table 4.3. Summary of the sleep Library Function.

Include Files(s)

<unistd.h>

Manual Section

3

Summary

unsigned int sleep(unsigned int seconds);

Return

Success

Failure

Sets errno

Amount of time left to sleep.

  

If, in the program function acquire, the number of tries has been exceeded, a FALSE value, indicating a failure, is returned. A boolean TRUE type value is returned if the while loop is exited because the creat call was successful. Additionally, if the creat fails for any other reason, a FALSE type value is returned.

The release function attempts to remove the file using the system call unlink (Table 4.4). This call deletes a file from the filesystem if the reference is the last link to the file and the file not currently in use. If the reference is a symbolic link, the link is removed. In the program the release function is coded to return the success or failure of unlink's ability to accomplish its task. As written, the main program discards the value returned by the release function.

Table 4.4. Summary of the unlink System Call.

Include Files(s)

<unistd.h>

Manual Section

2

Summary

int unlink(const char *pathname);

Return

Success

Failure

Sets errno

0

−1

Yes

If the unlink system call fails it returns a value of −1 and sets errno to one of the values found in Table 4.5. If unlink is successful, it returns a value of 0.

Table 4.5. unlink error messages.

#

Constant

perror Message

Explanation

1

EPERM

Operation not permitted

  • Not owner of file or not superuser.

  • The filesystem (in Linux) does not allow the unlinking of files.

2

ENOENT

No such file or directory

One or more parts of pathname to the file to process does not exist (or is NULL).

4

EINTR

Interrupted system call

A signal was caught during the system call.

5

EIO

I/O error

An I/O error has occurred.

12

ENOMEM

Cannot allocate memory

Insufficient kernel memory was available.

13

EACCES

Permission denied

  • Search permission denied on part of file path.

  • The requested access to the file is not allowed for this processes EUID.

14

EFAULT

Bad address

pathname references an illegal address space.

16

EBUSY

Device or resource busy

The referenced file is busy.

20

ENOTDIR

Not a directory

Part of the specified path is not a directory.

21

EISDIR

Is a directory

pathname refers to a directory (not a file).

26

ETXTBSY

Text file busy

More than one process has the executable open for writing.

30

EROFS

Read-only file system

pathname refers to a file that resides on a read-only filesystem.

36

ENAMETOOLONG

File name too long

pathname is too long.

40

ELOOP

Too many levels of symbolic links

The perror message says it all.

67

ENOLINK

The link has been severed

The path value references a remote system that is no longer available.

72

EMULTIHOP

Multihop attempted

The path value requires multiple hops to remote systems, but file system does not allow it.

A sample compilation run of the program is shown in Figure 4.4.

Example 4.4. Output of Program 4.1.

linux$ g++ p4.1.cxx lock_file.o -o p4.1        <-- 1

linux$ p4.1 1 5 & p4.1 2 2 &
24347 4        <-- 2
[1] 24347
[2] 24348
linux$ 24348 unable to obtain lock file after 2 tries.
24347 3
24347 2
24347 1
24347 0
[2]  + Exit 1                        p4.1 2 2        <-- 3
[1]  + Done                          p4.1 1 5
  • (1)Compile the program linking in the lock_file object code.

  • (2)Run the program twice, placing each in the background.

  • (3)Second instance of the program failed, returning a value of 1. The first instance completed normally.

The program p4.1 is invoked twice. To allow the two processes to execute concurrently, the program invocations are placed in the background (via the trailing &). The first process creates the lock file and gains access to the screen. This process is responsible for generating the five values (4, 3, 2, 1, 0) that are displayed on the screen. The second process, after two tries with a two-second interval between tries, exits and produces the message Unable to obtain lock file after 2 tries. When each process finishes, the operating system displays the exit/return value. The process that was unable to gain access to the resource exits with a value of 1. It is informative to run the program several times using varying settings. When doing so, you should be able to ascertain whether the lock file really does allow rudimentary communication between the processes involved.

Our example uses the creat system call as the base for its atomic file locking. Unfortunately, creat may generate race conditions on NFS filesystem (network mounted filesystem). The Linux manual page for creat recommends using the link system call as the atomic file locking operation (which it indicates should not cause race conditions in an NFS setting). The link system call is used to generate a hard link to the lock file, giving it new name. With a hard link, the link and the file being linked must reside on the same filesystem. If the stat system call for the file returns a link count of two, then the lock has been successfully implemented (acquired). See Exercise 4-1 for more on using link versus creat.

Locking Files

A second basic communication technique, similar in spirit to using lock files, can be implemented by using some of the standard file protection routines found in UNIX. UNIX allows the locking of records. As there is no real record structure imposed on a file, a record (which is sometimes called a segment or section) is considered to be a specified number of contiguous bytes of storage starting at an indicated location. If the starting location for the record is the beginning of a file, and the number of bytes equals the number found in the file, then the entire file is considered to be the record in question. Locking routines can be used to impose advisory or mandatory locking. In advisory locking the operating system keeps track of which processes have locked files. The processes that are using these files cooperate and access the record/file only when they determine the lock is in the appropriate state. When advisory locking is used, outlaw processes can still ignore the lock, and if permissions permit, modify the record/file. In mandatory locking the operating system will check lock information with every read and write call. The operating system will ensure that the proper lock protocol is being followed. While mandatory locking offers added security, it is at the expense of additional system overhead. Locks become mandatory if the file being locked is a plain file (not executable) and the set-group-ID is on and the group execution bit is off.

At a system level the chmod command can be used to specify a file support mandatory locking. For example, in Figure 4.5, the permissions on the data file x.dat are set to support mandatory file locking. The ls command will display the letter S in the group execution bit field of a file that supports a mandatory lock. Notice that in the example absolute mode was used with the chmod command to establish locking. The first digit of the mode value should be a 2 and the third digit a 6, 4, 2, or 0 (but not a 1).

Example 4.5. Specifying mandatory locking with chmod.

linux$ echo hello > x.dat        <-- 1

linux$ ls  -l  x.dat
-rw-r--r--   1 gray     faculty   6 Jan 30 12:06 x.dat        <-- 2

linux$ chmod  2644  x.dat        <-- 3
$ ls  -l  x.dat
-rw-r-Sr--   1 gray    faculty   6 Jan 30 12:06 x.dat
  • (1)Create a small text file.

  • (2)Default protections.

  • (3)Set the execution bit for the group.

The topic of record locking is expansive. We focus on one small aspect of it. We use file locking routines to place and remove an advisory lock on an entire file as a communication technique with cooperating processes.

There are several ways to set a lock. The two most common approaches are presented: the fcntl system call and the lockf library function. We begin with fcntl (Table 4.6).

Table 4.6. Summary of the fcntl System Call.

Include File(s)

<unistd.h>
<fcntl.h>

Manual Section

2

Summary

int fcntl(int fd, int cmd /* , struct
flock *lock */);

Return

Success

Failure

Sets errno

Value returned depends upon the cmd argument passed.

−1

Yes

As its first argument the fcntl system call is passed a valid integer file descriptor of an open file. The second argument, cmd, is an integer command value that specifies the action that fcntl should take. The command values for locking are specified as defined constants in the header file <bits/fcntl.h> that is included by the <fcntl.h> header file. The lock specific constants are shown in Table 4.7.

Table 4.7. Lock-Specific Defined Constants Used with the fcntl System Call.

Defined Constant

Action Taken by fcntl

F_SETLK

Set or remove a lock. Specific action is based on the contents of the flock structure that is passed as a third argument to fcntl.

F_SETLKW

Same as F_SETLK, but block (wait) if the indicated record/segment is not available—the default is not to block.

F_GETLK

Return lock status information via the flock structure that is passed as the third argument to fcntl.

The third argument for fcntl is optional for some invocations (as indicated by it being gcommented out in the function prototype). However, when working with locks, the third argument is specified and references a flock structure, which is defined as

struct flock  {
     short int l_type;   /* Type of lock: F_RDLCK, F_WRLCK, or F_UNLCK. */
     short int l_whence; /* Where 'l_start' is relative to.             */
     #ifndef __USE_FILE_OFFSET64
     __off_t l_start;    /* Offset where the lock begins.               */
     __off_t l_len;      /* Size of the locked area; (0 == EOF).       */
     #else
     __off64_t l_start;  /* For systems with 64 bit offset.             */
     __off64_t l_len;
     #endif
     __pid_t l_pid;      /* PID of process holding the lock.            */
};

The flock structure is used to pass information to and return information from the fcntl call. The type of lock, l_type, is indicated by using one of the defined constants shown in Table 4.8.

The l_whence, l_start, and l_len flock members are used to indicate the starting location (0, the beginning of the file; 1, the current location; and 2, the end of the file), relative offset, and size of the record (segment). If these values are set to 0, the entire file will be operated upon. The l_pid member is used to return the PID of the process that placed the lock.

Table 4.8. Defined Constants Used in the flock l_type Member.

Defined Constant

Lock Specification

F_RDLCK

Read lock

F_WRLCK

Write lock

F_UNLCK

Remove lock

When dealing with locks, if fcntl fails to carry out an indicated command, it will return a value of −1 and set errno. Error messages associated with locking are shown in Table 4.9.

Table 4.9. fcntl Error Messages Relating to Locking.

#

Constant

perror Message

Explanation

4

EINTR

Interrupted system call

A signal was caught during the system call.

9

EBADF

Bad file number

fd does not reference a valid open file descriptor.

11

EAGAIN

Resource temporarily unavailable

Lock operation is prohibited, as the file has been memory mapped by another process.

13

EACCES

Permission denied

Lock operation prohibited by a lock held by another process.

14

EFAULT

Bad address

*lock references an illegal address space.

22

EINVAL

Invalid argument

  • cmd invalid.

  • cmd is F_GETLK or F_SETLK and *lock or data referenced by *lock is invalid.

  • fd does not support locking.

35

EDEADLK

Resource deadlock avoided

cmd is F_SETLKW and requested lock is blocked by a lock from another process. If fcntl blocks the calling process waiting for lock to be free, deadlock would occur.

37

ENOLCK

No locks available

System has reached the maximum number of record locks.

Program 4.2 demonstrates the use of file locking.

Example 4.2. Using fcntl to lock a file.

File : p4.2.cxx
  |     /* Locking a file with fcntl
  |      */
  |     #include <iostream>
  +     #include <cstdio>
  |     #include <cerrno>
  |     #include <fcntl.h>
  |     #include <unistd.h>
  |     using namespace std;
  |     const int MAX = 5;
 10     int
  |     main(int argc, char *argv[ ]) {
  |       int             f_des, pass = 0;
  |       pid_t           pid = getpid();
  |       struct flock    lock;                // for fcntl info
  +       if (argc < 2) {                      // name of file to lock missing
  |         cerr << "Usage " << *argv << " lock_file_name" << endl;
  |         return 1;
  |       }
  |       sleep(1);                            // don't start immediately
 20       if ((f_des = open(argv[1], O_RDWR)) < 0){
  |         perror(argv[1]);                   // could not access file
  |         return 2;
  |       }
  |       lock.l_type   = F_WRLCK;             // set a write lock
  +       lock.l_whence = 0;                   // start at beginning
  |       lock.l_start  = 0;                   // with a 0 offset
  |       lock.l_len    = 0;                   // whole file
  |       while (fcntl(f_des, F_SETLK, &lock) < 0) {
  |         switch (errno) {
 30         case EAGAIN:
  |         case EACCES:
  |           if (++pass < MAX)
  |             sleep(1);
  |           else {                           // run out of tries
  +             fcntl(f_des, F_GETLK, &lock);
  |             cerr << "Process " << pid << " found file "
  |                  << argv[1] << " locked by " << lock.l_pid << endl;
  |             return 3;
  |           }
 40           continue;
  |         }
  |         perror("fcntl");
  |         return 4;
  |       }
  +       cerr << endl << "Process " << pid << " has the file" << endl;
  |       sleep(3);                            // fake processing
  |       cerr << "Process " << pid << " is done with the file" << endl;
  |       return 0;
  |     }

In this program the name of the file to be locked is passed on the command line. A call to sleep is placed at the start of the program to slow down the processing (for demonstration purposes only). The designated file is opened for reading and writing. In lines 24 through 27 the lock structure is assigned values that indicate a write lock is to be applied to the entire file. In the while loop that follows, a call to fcntl requests the lock be placed. If fcntl fails and errno is set to either EAGAIN or EACCES (values that indicate the lock could not be applied), the process will sleep for one second and try to apply the lock again. To be safe, the EACCES constant is grouped with EAGAIN, as in some versions of UNIX this is the value that is returned when a lock cannot be applied. If the MAX number of tries (passes) has been exceeded, another call to fcntl (line 35) is made to obtain information about the process that has locked the file. In this call the address of the lock structure is passed to fcntl. The PID of the locking process is displayed, and the program exits. If an error other than EAGAIN or EACCES is encountered when attempting to set the lock, perror is called, a message is displayed, and the program exits. If the process successfully obtains the lock, the process prints an informational message, sleeps three seconds (to simulate some sort of processing), and prints a second message as it terminates. When the process terminates, the system automatically removes the lock on the file. If the process were not to terminate, the process would need to set the l_type member to F_UNLCK and reissue the fcntl call to clear the lock.

If we run three copies of Program 4.2 in rapid succession, using the file x.dat as the lock file, their output will be similar to that shown in Figure 4.6.

Example 4.6. Running multiple copies of Program 4.2—locking a file.

linux$ p4.2 x.dat & p4.2 x.dat &  p4.2 x.dat &        <-- 1
[1] 28392
[2] 28393
[3] 28394
$
Process 28392 has the file
Process 28392 is done with the file
Process 28393 has the file
Process 28394 found file x.dat locked by 28393
Process 28393 is done with the file

[3]  — Exit 3                        p4.2 x.dat
[2]  — Done                          p4.2 x.dat
[1]  + Done                          p4.2 x.dat
  • (1)All three processes will use the same file: x.dat.

Notice that the last process, PID 28394 in this example, is unable to place a lock on the file and returns the process ID of the process that currently has the lock on the file. The second process, PID 28393, through repeated retries (with intervening calls to sleep) is able to lock the file once the first process is finished with it.

The lockf library function may also be used to apply, test, or remove a lock on an open file. Beneath the covers this library function is an alternate interface for the fcntl system call. The lockf library function is summarized in Table 4.10.

Table 4.10. Summary of the lockf Library Call

Include File(s)

<sys/file.h>
<unistd.h>

Manual Section

3

Summary

int lockf(int fd, int cmd, off_t len);

Return

Success

Failure

Sets errno

0

−1

Yes

The fd argument is a file descriptor of a file that has been opened for either writing (O_WRONLY) or for reading and writing (O_RDWR). The cmd argument for lockf is similar to the cmd argument used with fcntl. The cmd value indicates the action to be taken. The action that lockf will take for each cmd value (as specified in the include file <unistd.h>) is summarized in Table 4.11.

Table 4.11. Defined cmd Constants.

Defined Constant

Lock Specification

F_ULOCK

Unlock a previously locked file.

F_LOCK

Lock a file (or a section of a file) for exclusive use if it is available. If unavailable, the lockf function will block.

F_TLOCK

Test and, if successful, lock a file (or section of a file) for exclusive use. An error is returned if no lock can be applied; with this option the lockf function will not block if the lock cannot be applied.

F_TEST

Test a file for the presence of a lock. A 0 is returned if the file is unlocked or locked by the current process. If locked by another process, −1 is returned and errno is set to EACCES.

The len argument of lockf indicates the number of contiguous bytes to lock or unlock. A value of zero indicates the section should be from the present location to the end of the file.

If the lockf call is successful, it returns a value of 0. If the call fails, it sets errno and returns the value −1 (Table 4.12).

Table 4.12. lockf error messages.

#

Constant

perror Message

Explanation

9

EBADF

Bad file number

fd is not a valid open file descriptor.

11

EAGAIN

Resource temporarily unavailable

  • The cmd is F_TLOCK or F_TEST, and the specified section is already locked.

  • File is memory mapped by another process.

13

EACCES

Permission denied

Lock operation prohibited by a lock held by another process.

22

EINVAL

Invalid argument

Invalid operation specified for fd.

35

EDEADLK

File locking deadlock

Requested lock operation would cause a deadlock.

37

ENLOCK

No locks available

Maximum number of system locks has been reached.

Of the two techniques, lockf is simpler but less flexible than using fcntl. Note that when using the lockf call, the user must issue a separate lseek system call to position the file pointer to the proper location in the file prior to the call. Also, when generating parent/child process pairs, each shares the same file pointer. If locks are to be used in both processes, it is sometimes best to close and reopen the file in question so that each process has its own separate file pointer.

A final note—Linux supports a shlock command that can be used in shell scripts. The shlock command creates a lock file that contains an identifying PID.

More About Signals

A second primitive interprocess communication technique involves the use of signals. As previously indicated, signals occur asynchronously (with no specified timing or sequencing). Signals, which can be sent by processes or by the kernel, serve as notification that an event has occurred. Signals are generated when the event first occurs and are considered to be delivered when the process takes action on the signal. The delivery of most signals can be blocked so the signal can be acted upon later. Blocked signals, and those sent to processes in a non-running state are commonly called pending signals.

The symbolic name for each signal can be found in several places. Usually, the manual pages for signal ( try man 7 signal) or the header file <asm/signal.h> will contain a list of each signal name. Signals, as described in Section 7 of the manual, are shown in Table 4.13. The definition of a signal (its symbolic name, the associated integer value, and the event signaled) has evolved over time. Signals defined by the POSIX 1 standard have the letter P in the Def column; those defined by SUS v2 (Single UNIX Specification, version 2) have a letter S. The letter O indicates signals not defined by either of these standards. Furthermore, keep in mind that some signals are architecture-dependent. To denote this if three numbers are listed in the Value column for a signal, the first number is the signal for alpha and sparc platforms; the middle number is for i386 and ppc platforms; while the last number is for mips platforms. A dash () indicates the signal is missing for the platform. A single value indicates all platforms use the same signal number. The default action associated with the signal is defined by one or more letters in the Action column of the table. The letter A indicates the recipient process will terminate; B, the process will ignore the signal; C, the process will terminate and produce a core file; and D, the process will stop (suspend) execution. Additionally, the letter E indicates the signal cannot be caught (trapped), and the letter F, that the signal cannot be ignored.

Table 4.13. Signal Definitions.

Symbolic Name

Def

Value

Action

Description

SIGABRT

P

6

C

Abort signal from abort.

SIGALRM

P

14

A

Timer signal from alarm.

SIGBUS

S

10,7,10

C

Bus error (bad memory access).

SIGCHLD

P

20,17,18

B

Sent to parent when child is stopped or terminated.

SIGCLD

O

-,-,18

B

A synonym for SIGCHLD.

SIGCONT

P

19,18,25

B

Resume if process is stopped.

SIGEMT

O

7,-,7

C

Emulation trap.

SIGFPE

P

8

C

Floating-point exception.

SIGHUP

P

1

A

A hangup was detected on the controlling terminal or the controlling process has died.

SIGILL

P

4

C

Illegal instruction.

SIGINFO

O

29,-,-

 

A synonym for SIGPWR.

SIGINT

P

2

A

Interrupt from keyboard.

SIGIO

O

23,29,22

A

I/O now possible.

SIGIOT

O

6

C

IOT trap—equivalent to SIGABRT.

SIGKILL

P

9

A,E,F

Kill signal—force process termination.

SIGLOST

O

-,-,-

A

File lock lost.

SIGPIPE

P

13

A

Broken pipe; write to pipe with no readers.

SIGPOLL

S

23

A

A pollable event has occurred—synonymous with SIGIO (also 23).

SIGPROF

S

27,27,29

A

Profiling timer expired.

SIGPWR

O

29,30,19

A

Power supply failure.

SIGQUIT

P

3

C

Quit from keyboard.

SIGSEGV

P

11

C

Invalid memory reference (segmentation violation).

SIGSTKFLT

O

-,16,-

A

Coprocessor stack error.

SIGSTOP

P

17,19,23

D,E,F

Stop process—not from tty.

SIGSYS

S

12,-,12

C

Bad argument to system call.

SIGTERM

P

15

A

Termination signal from kill.

SIGTRAP

S

5

C

Trace/breakpoint trap for debugging.

SIGTSTP

P

18,20,24

D

Stop typed at a tty.

SIGTTIN

P

21,21,26

D

Background process needs input.

SIGTTOU

P

22,22,27

D

Background process needs to output.

SIGUNUSED

O

-,31,-

A

Unused signal (will be SIGSYS).

SIGURG

S

16,23,21

B

Urgent condition on I/O channel (socket).

SIGUSR1

P

30,10,16

A

User-defined signal 1.

SIGUSR2

P

31,12,17

A

User-defined signal 2.

SIGVTALRM

S

26,26,28

A

Virtual alarm clock.

SIGWINCH

O

28,28,20

B

Window resize signal.

SIGXCPU

S

24,24,30

C

CPU time limit exceeded.

SIGXFSZ

S

25,25,31

C

File size limit exceeded.

Some additional caveats to consider include the following:

  • For some S signals (SUS v2), the default action is listed as A (terminate) but by their actual action should be C (terminate the process and generate a core file).

  • Signal 29 is SIGINFO/SIGPWR on an alpha platform but SIGLOST on a sparc platform.

Note that all signals begin with the prefix SIG and end with a semimnemonic suffix. For the sake of portability when referencing signals, it is usually best to use their symbolic names rather than their assigned integer values. The defined constants SIGRTMIN and SIGRTMAX are also found in <asm/signal.h> and allow the generation of additional real-time signals. Real-time signals, usually the values 32 to 63, can be queued. The queuing of signals ensures that when multiple signals are sent to a process, they will not be lost. At present, the Linux kernel does not make use of real-time signals.

For each signal, a process may take one of the following three actions:

  1. Perform the default action. This is the action that will be taken unless otherwise specified. The default action for each signal is listed in the previous table. Specifically these actions are

    • Terminate (Abort)—. Perform all the activities associated with the exit system call.

    • Core (Dump)—. Produce a core image (file) and then perform termination activities.

    • Stop—. Suspend processing.

    • Ignore—. Disregard the signal.

  2. Ignore the signal. If the signal to be ignored is currently blocked, it is discarded. The SIGKILL and SIGSTOP signals cannot be ignored.

  3. Catch the signal. In this case, the process supplies the address of a function (often called a signal catcher) that is to be executed when the signal is received. In most circumstances, the signal catching function will have a single integer parameter. The parameter value, which is assigned by the system, will be the numeric value of the signal caught. When the signal catcher function finishes, the interrupted process will, unless otherwise specified, resume its execution where it left off.

A discussion of the implementation details for ignoring and catching signals are covered in Section 4.5.

Signals are generated in a number of ways:

  1. By the kernel, indicating

    • Hardware conditions, the most common of which are SIGSEGV, when there has been an addressing violation by the process, and SIGFPE, indicating a division by zero.

    • Software conditions, such as SIGIO, indicating I/O is possible on a file descriptor or the expiration of a timer.

  2. By the user at a terminal:

    • Keyboard—. The user produces keyboard sequences that will interrupt or terminate the currently executing process. For example, the interrupt signal, SIGINT, is usually mapped to the key sequence CTRL+C and the terminate signal, SIGQUIT, to the key sequence CTRL+. The command stty -a will display the current mappings of keystrokes for the interrupt and quit signals.

    • kill command—. By using the kill command, the user, at the command line, can generate any of the previously listed signals for any process that has the same effective ID. The syntax for the kill command is

      $ kill [ -signal ] pid . . .
      

      When issued, the kill command will send the specified signal to the indicated PID. The signal can be an integer value or one of the symbolic signal names with the SIG prefix removed. If no signal number is given, the default is SIGTERM (terminate). The PID(s) (multiple PIDs are separated with whitespace) are the IDs of the processes that will be sent the signal. If needed, the ps command can be used to obtain current PIDs for the user.

      It is possible for the pid value to be less than 1 and/or for the signal value to be 0. In these cases, the kill command will carry out the same actions as specified for the kill system call described in the following section. As would be expected, the kill command is just a command-line interface to the kill system call.

  3. By other processes:

    • By the kill system call (Table 4.14). The kill system call is used to send a signal to a process or a group of processes.

Notice that the argument sequence for the kill system call is the reverse of that of the kill command. The value specified for the pid argument indicates which process or process group will be sent the signal. Table 4.15 summarizes how to specify a process or process group.

Table 4.14. Summary of the kill System Call.

Include File(s)

<sys/types.h>
<signal.h>

Manual Section

2

Summary

int kill( pid_t pid, int sig );

Return

Success

Failure

Sets errno

0

−1

Yes

Table 4.15. Interpretation of pid values by the kill System Call.

pid

Process(es) Receiving the Signal

>0

The process whose process ID is the same as pid

0

All the processes in the same process group as the sender

-1

Not superuserAll processes whose real ID is the same as the effective ID of the sender

SuperuserAll processes excluding special processes

<-1

All the processes whose process group is absolute_value (-pid)

The value for sig can be any of the symbolic signal names (or the equivalent integer value) found in the signal header file. If the value of sig is set to 0, the kill system call will perform an error check of the specified PID, but will not send the process a signal. Sending a signal of 0 to a PID and checking the return value of the kill system call is sometimes used as a way of determining if a given PID is present. This technique is not foolproof, as the process may terminate on its own immediately after the call to check on it has been made. Remember that UNIX will reuse PID values once the maximum PID has been assigned. The statement

kill(getpid(),sig);

can be used by a process to send itself the signal specified by sig.[5]

If the kill system call is successful, it returns a 0; otherwise, it returns a value of −1 and sets errno as indicated in Table 4.16. In Linux, for security reasons, it not possible to send a signal to process one—init. Signals are passed to init via telinit.

Table 4.16. kill Error Messages.

#

Constant

perror Message

Explanation

1

EPERM

Operation not permitted

  • Calling process does not have permission to send signal to specified process(es).

  • Process is not superuser and its effective ID does not match real or saved user ID.

3

ESRCH

No such process

No such process or process group as pid.

22

EINVAL

Invalid argument

Invalid signal number specified.

The alarm system call sets a timer for the issuing process and generates a SIGALRM signal when the specified number of real-time seconds have passed.

Table 4.17. Summary of the alarm System Call.

Include File(s)

<unistd.h>

Manual Section

2

Summary

unsigned int alarm(unsigned int seconds);

Return

Success

Failure

Sets errno

Amount of time remaining

  

If the value passed to alarm is 0, the timer is reset. Processes generated by a fork have their alarm values set to 0, while processes created by an exec inherit the alarm with its remaining time. alarm calls cannot be stacked—multiple calls will reset the alarm value. A call to alarm returns the amount of time remaining on the alarm clock. A “sleep” type arrangement can be implemented for a process using alarm. However, mixing calls to alarm and sleep is not a good idea.

Program 4.3 demonstrates the use of an alarm system call.

Example 4.3. Setting an alarm.

File : p4.3.cxx
  |     #include <iostream>
  |     #include <iomanip>
  |     #include <cstdlib>
  |     #include <sys/types.h>
  +     #include <sys/wait.h>
  |     #include <unistd.h>
  |     using namespace std;
  |     int
  |     main(int argc, char *argv[] ) {
 10       int w, status;
  |       if ( argc < 4 ) {
  |         cerr << "Usage: " << *argv << " value_1 value_2 value_3 "
  |              << endl;
  |         return 1;
  +       }
  |       for(int i=1; i <= 3; ++i)
  |         if ( fork( ) == 0 ) {
  |            int t = atoi(argv[i]);
  |            cout << "Child " << getpid( ) << " waiting to die in "
 20                 << t << " seconds." << endl;
  |            alarm( t );
  |            pause( );
  |            cout << getpid( ) << " is done." << endl;
  |         }
  +       while (( w=wait(&status)) && w != -1)
  |         cout << "Wait on PID: " << dec <<  w << " returns status of  "
  |              << setw(4) << setfill(48) << hex
  |              << setiosflags(ios::uppercase) << status << endl;
  |       return 0;
 30     }

When the program is invoked, three integer values are passed to the program. The parent process generates three child processes using the command-line values to set the alarm in process. In line 22 the pause library function is called. This function causes the child process to wait for the receipt of a signal. In the example, this will be the receipt of the SIGALRM signal. When the signal is received, the child process takes the default action for the signal. The default for SIGALRM is for the process to exit and return the value of the signal to its waiting parent. The parent process waits for all of the child processes to finish. As each finishes, the parent displays the child PID and its return status information. It is important to note that the cout statement in line 23 is never executed, as the child process exits before reaching this statement. This can be verified by the output shown in Figure 4.7.

Example 4.7. Setting an alarm in multiple child processes.

linux$ p4.3  3  1  5
Child 17243 waiting to die in 3 seconds.
Child 17244 waiting to die in 1 seconds.
Child 17245 waiting to die in 5 seconds.
Wait on PID: 17244 returns status of  000E
Wait on PID: 17243 returns status of  000E
Wait on PID: 17245 returns status of  000E        <-- 1
  • (1)The child processes end in the order specified by their alarm times. Each passes back the SIGALRM value (14 an E in hexadecimal).

A call to pause suspends a process (causing it to sleep) until it receives a signal that has not been ignored (Table 4.18).

Table 4.18. Summary of the pause Library Function.[6]

Include File(s)

<unistd.h>

Manual Section

2

Summary

int pause ( void );

Return

Success

Failure

Sets errno

If the signal does not cause termination then –1 returned

Does not return

Yes

[6] While in Section 2 of the manual, the manual page indicates this is a library function.

pause returns a −1 if the signal received while pausing does not cause process termination. The value in errno will be EINTR (4). If the received signal causes termination, pause will not return (which is to be expected!).

Signal and Signal Management Calls

In the previous section we noted that a process can handle a signal by doing nothing (thus allowing the default action to occur), ignoring the signal, or catching the signal. Both the ignoring and catching of a signal entail the association of a signal-catching routine with a signal. In brief, when this is done the process automatically invokes the signal-catching routine when the stipulated signal is received. There are two basic system calls that can be used to modify what a process will do when a signal has been received: signal and sigaction. The signal system call has been present in all versions of UNIX and is now categorized as the ANSI C version signal-handling routine (Table 4.19). The sigaction system call (Table 4.20) is somewhat more recent and is one of a group of POSIX signal management calls.

Table 4.19. Summary of the signal System Call.

Include File(s)

<signal.h>

Manual Section

2

Summary

void (*signal(int signum,
     void (*sighandler)(int)))(int);

Return

Success

Failure

Sets errno

Signal's previous disposition

SIG_ERR (defined as −1)

Yes

Table 4.20. Summary of the sigaction System Call.

Include File(s)

<signal.h>

Manual Section

2

Summary

int sigaction(int signum, const
              struct sigaction *act,
              struct sigaction *oldact);

Return

Success

Failure

Sets errno

0

−1

Yes

The most difficult part of using signal is deciphering its prototype. In essence, the prototype declares signal to be a function that accepts two arguments—an integer signum value and a pointer to a function—which are called when the signal is received. If the invocation of signal is successful, it returns a pointer to a function that returns nothing (void). This is the previous disposition for the signal. The mysterious (int), found at the far right of the prototype, indicates the referenced function has an integer argument. This argument is automatically filled by the system and contains the signal number. Either system call fails and returns the value −1, setting the value in errno to EINTR (4), if it is interrupted or to EINVAL (22) if the value given for signum is not valid or is set to SIGKILL or SIGSTOP. Further, sigaction returns EFAULT (14) if the act or oldact arguments reference an invalid address space.

While both signal and sigaction deal with signal handling, the functionality of each is slightly different. Let's begin with the signal system call.

The first argument to the signal system call is the signal that we intend to associate with a new action. The signal value can be an integer or a symbolic signal name. This value cannot be SIGKILL or SIGSTOP. The second argument to signal is the address of the signal-catching function. The signal-catching function can be a user-defined function or one of the defined constants SIG_DFL or SIG_IGN. Specifying SIG_DFL for a signal resets the action to be taken to its default action when the signal is received. Indicating SIG_IGN for a signal means the process will ignore the receipt of the indicated signal.

An examination of the signal header files shows that SIG_DFL and SIG_IGN are defined as integer values that have been appropriately cast to address locations that are invalid (such as −1, etc.). The declaration most commonly found for SIG_DFL and SIG_IGN is shown below. With these definitions is another defined constant that can be used—SIG_ERR. This constant is the value that is returned by signal if it fails. See Figure 4.8.

Example 4.8. Defined constants used by signal and sigset.

/* Fake signal functions.  */

#define SIG_ERR ((__sighandler_t) -1)      /* Error return.  */
#define SIG_DFL ((__sighandler_t)  0)      /* Default action.  */
#define SIG_IGN ((__sighandler_t)  1)      /* Ignore signal.  */

Program 4.4 uses the signal system call to demonstrate how a signal can be ignored.

Example 4.4. Pseudo nohup—ignoring a signal.

File : p4.4.cxx
  |     /* Using the signal system call to ignore a hangup signal
  |      */
  |     #include <iostream>
  +     #include <cstdio>
  |     #include <cstdlib>
  |     #include <signal.h>
  |     #include <fcntl.h>
  |     #include <unistd.h>
  |     using namespace std;
 10     const char  *file_out = "nohup.out";
  |     int
  |     main(int argc, char *argv[]){
  |       int       new_stdout;
  |       if (argc < 2) {
  +         cerr << "Usage: " << *argv << " command [arguments]" << endl;
  |         return 1;
  |       }
  |       if (isatty( 1 )) {
  |         cerr <<  "Sending output to " << file_out << endl;
 20         close( 1 );
  |         if ((new_stdout = open(file_out, O_WRONLY | O_CREAT |
  |                                O_APPEND, 0644)) == -1)        {
  |           perror(file_out);
  |           return 2;
  +         }
  |       }
  |       if (signal(SIGHUP, SIG_IGN) == SIG_ERR) {
  |         perror("SIGHUP");
  |         return 3;
 30       }
  |       ++argv;
  |       execvp(*argv, argv);
  |       perror(*argv);                       // Should not get here unless
  |       return 4;                            // the exec call fails.
  +     }

Program 4.4 is a limited version of the /usr/bin/nohup command found on most UNIX-based systems. The nohup command can be used to run commands so they will be immune to the receipt of SIGHUP signals. If the standard output for the current process is associated with a terminal, the output from nohup will be sent to the file nohup.out. The nohup command is often used with the command-line background specifier & to allow a command to continue its execution in the background even after the user has logged out.

Like the real nohup, our pseudo nohup program (Program 4.4) will execute the command (with optional arguments) that is passed to it on the command line. After checking the number of command-line arguments, the file descriptor associated with stdout is evaluated. The assumption here is that the file descriptor associated with stdout is 1. However, if needed, there is a standard I/O function named fileno that can be used to find the integer file descriptor for a given argument stream. The library function isatty (Table 4.21) is used to determine if the descriptor is associated with a terminal device.

Table 4.21. Summary of the isatty Library Function.

Include File(s)

<unistd.h>

Manual Section

3

Summary

Int isatty( int desc );

Return

Success

Failure

Sets errno

1

0

 

The isattty library function takes a single integer desc argument. If desc is associated with a terminal device, isatty returns a 1; otherwise, it returns a 0. In the program, if the isatty function returns a 1, an informational message is displayed to standard error to tell the user where the output from the command passed to the pseudo nohup program can be found. Next, the file descriptor for stdout is closed. The open statement that follows the close returns the first free file descriptor. As we have just closed stdout, the descriptor returned by the open will be that of stdout. Once this reassignment has been done, any information written to stdout (cout) by the program will in turn be appended to the file nohup.out. Notice that the call to signal to ignore the SIGHUP signal is done within an if statement. Should the signal system call fail (return a SIG_ERR), a message would be displayed to standard error and the program would exit. If the signal call is successful, the argv pointer is incremented to step past the name of the current program. The remainder of the command line is then passed to the execvp system call. Should the execvp call fail, perror will be invoked and a message displayed. If execvp is successful, the current process will be overlaid by the program/command passed from the command line.

The output in Figure 4.9 shows what happens when the pseudo nohup program is run on a local system and passed a command that takes a long time to execute. In the example the long-running command is a small Korn shell script called count that counts from 1 to 100, sleeping one second after the display of each value. As written, the output from the script would normally be displayed on the screen.

Example 4.9. Output of Program 4.4 when passed a command that takes a long time to execute.

linux$ cat count
#!  /bin/ksh
c=1 
while (( $c <= 100 ))        <-- 1
do
  echo "$c"
  sleep 1
  (( c = c + 1 ))
done

linux$ ./p4.4 ./count &        <-- 2
Sending output to nohup.out
[1] 19481
linux$ jobs        <-- 3
[1]  + Running                     p4.4 count
linux$ kill -HUP %1        <-- 4
linux$ jobs
[1]  + Running                     p4.4 count
linux$ kill -KILL %1
linux$
[1]    Killed                      p4.4 count
linux$ jobs
linux$
  • (1)The script count from 1 to 100, sleeping one second in between the display of each number. If run on the command line, it will take approximately 100 seconds to count from 1 to 100.

  • (2)Pass the count script to our pseudo nohup program—place it in the background.

  • (3)The operating system returns the PID of the background process.

  • (4)Sending a hangup signal to the process does not cause it to terminate.

When the program was placed in the background, the system reported the job number (in this case [1]) and the PID (19481). The jobs command confirms that the process is still running. As can be seen, the kill -HUP %1 command (which sends a hangup signal to the first job in the background) did not cause the program to terminate. This is not unexpected, as the SIGHUP signal was being ignored. The command kill –KILL %1 was used to terminate the process by sending it a SIGKILL signal.

As noted, if a signal-catching function name is supplied to the signal system call, the process will automatically call this function when the process receives the signal. However, prior to calling the function, if the signal is not SIG KILL, SIGPWR, or SIGTRAP, the system will reset the signal's disposition to its default. This means that if two of the same signals are received successively, it is entirely possible that before the signal-catching routine is executed, the second signal may cause the process to terminate (if that is the default action for the signal). This behavior reduces the reliability of using signals as a communication device. It is possible to reduce, but not entirely eliminate, this window of opportunity for failure by resetting the disposition for the signal in the catching routine. Program 4.5 catches signals and attempts to reduce this window of opportunity.

Example 4.5. Catching SIGINT and SIGQUIT signals.

File : p4.5.cxx
  |     /* Catching a signal
  |      */
  |     #include <iostream>
  +     #include <cstdlib>
  |     #include <cstdio>
  |     #include <signal.h>
  |     #include <unistd.h>
  |     using namespace std;
  |     int
 10     main( ) {
  |       void            signal_catcher(int);
  |       if (signal(SIGINT , signal_catcher) == SIG_ERR) {
  |         perror("SIGINT");
  |         return 1;
  +       }
  |       if (signal(SIGQUIT , signal_catcher) == SIG_ERR) {
  |         perror("SIGQUIT");
  |         return 2;
  |       }
 20       for (int i=0;  ; ++i) {              // Forever ...
  |         cout << i << endl;                 // display a number
  |         sleep(1);
  |       }
  |       return 0;
  +     }
  |     void
  |     signal_catcher(int the_sig){
  |       signal(the_sig, signal_catcher);     // reset immediately
  |       cout << endl << "Signal " << the_sig << " received." << endl;
 30       if (the_sig == SIGQUIT)
  |         exit(3);
  |     }

In an attempt to avoid taking the default action (which in this case is to terminate) for either of the two caught signals, the first statement (line 28) in the program function signal_catcher is a call to signal. This call reestablishes the association between the signal being caught and the signal-catching routine.

Figure 4.10 shows the output of the program when run on a local system.

Example 4.10. Output of Program 4.5.

linux$ p4.5
0
1
2
ò        <-- 1
Signal 2 received.
3
4
ò        <-- 2
Signal 2 received.
5           |
ò        <-- 2
Signal 2 received.
6
ò
Signal 3 received.
linux$
  • (1)The user types CTRL+C. The terminal program displays a funny graphics character, ò.

  • (2)Here the signals are generated in rapid succession.

From this output we can see that each time CTRL+C was pressed, it was echoed back to the terminal as ò. If CTRL+C was struck twice in quick succession, the program responded with the Signal 2 received message for each keyboard sequence. On this system it appears as if some of the signals were queued if they were received in rapid succession. However, this is somewhat misleading, as the mechanics of terminal I/O come into play. Say we were (via a background process) to deliver to the process, in very rapid succession, multiple copies of the same signal. In this setting we would find most often that only one copy of the signal would be delivered to the process, while the others are discarded. Most systems do not queue the signals 1 through 31. When a SIGQUIT signal was generated, a message was displayed and the program exited.

The sigaction system call, like the signal system call, can be used to associate an alternate action with the receipt of a signal. This system call has three arguments. The first is an integer value that specifies the signal. As with the signal system call, this argument can be any valid signal except SIGKILL or SIGSTOP. The second and third arguments are references to a sigaction structure. Respectively these structures store the new and previous action for the signal. The full definition of the sigaction structure is found in the file sigaction.h. This file is automatically included by signal.h. Basically, the sigaction structure is

struct sigaction {
     void (*sa_handler)(int);                        // 1
     void (*sa_sigaction)(int, siginfo_t *, void *); // 2
     sigset_t sa_mask;                               // 3
     int sa_flags;                                   // 4
     void (*sa_restorer)(void);                      // 5
}

Both sa_handler and sa_sigaction can be used to reference a signal handling function. Only one of these should be specified at any given time, as on most systems this data is often stored in a union within the sigaction structure. By definition, a union can hold only one of its members at a time. Our discussion centers on using the sa_handler member. The sa_mask member specifies the signals, which should be blocked when the signal handler is executing. Each signal is represented by a bit. If the bit in the mask is on, the signal is blocked. By default the signal that triggered the handler is blocked. The sa_flags member is used to set flags that modify the behavior of the signal-handling process. Flag constants, shown in Table 4.22, can be combined using a bitwise OR.

Table 4.22. sa_flags Constants.

Flag

Action

SA_NOCLDSTOP

If the signal is SIGCHILD, then the calling process will not receive a SIGCHILD signal when its child processes exit.

SA_ONESHOT or SA_RESETHAND

Restore the default action after the signal handler has been called once (similar to the default of the signal call).

SA_RESTART

Use BSD signal semantics (certain interrupted system calls are restarted after the signal has been caught).

SA_NOMASK or SA_NODEFER

Undo the default whereby the signal triggering the handler is automatically blocked.

SA_SIGINFO

The signal handler has three arguments—use sa_sigaction, not sa_handler.

The remaining structure member, sa_restorer, is obsolete and should not be used.

Unlike signal, a sigaction installed signal-catching routine remains installed even after it has been invoked. Program 4.6, which is similar to Program 4.5, shows the use of the sigaction system call.

Again, notice that in the program function signal_catcher, it is no longer necessary to reset the association for the signal caught to the signal-catching routine.

Example 4.6. Using the sigaction system call.

File : p4.6.cxx
  |     /* Catching a signal using sigaction
  |      */
  |     #define_GNU_SOURCE
  +     #include <iostream>
  |     #include <cstdlib>
  |     #include <cstdio>
  |     #include <signal.h>
  |     #include <unistd.h>
  |     using namespace std;
 10     int
  |     main( ) {
  |       void   signal_catcher(int);
  |       struct sigaction new_action;        <-- 1
  |       new_action.sa_handler = signal_catcher;
  +       new_action.sa_flags   = 0;        <-- 2
  |
  |       if (sigaction(SIGINT,  &new_action, NULL) == -1) {
  |         perror("SIGINT");        <-- 3
  |         return 1;                  |
 20       }        <-- 3
  |       if (sigaction(SIGQUIT, &new_action, NULL) == -1) {
  |         perror("SIGQUIT");
  |         return 2;
  |       }
  +       for (int i=0;  ; ++i) {              // Forever ...
  |         cout << i << endl;                 // display a number
  |         sleep(1);
  |       }
  |       return 0;
 30     }
  |     void
  |     signal_catcher(int the_sig){
  |       cout << endl << "Signal " << the_sig << " received." << endl;
  |       if (the_sig == SIGQUIT)
  +         exit(3);
  |     }
  • (1)A sigaction structure is allocated.

  • (2)The signal catching function is assigned and the sa_flags member set to 0.

  • (3)A new action is associated with each signal.

Three other POSIX signal-related system calls that can be used for signal management are shown in Table 4.23.

Table 4.23. Summary of the sigprocmask, sigpending, and sigsuspend System Call.

Include File(s)

<unistd.h>

Manual Section

2

Summary

int sigprocmask (int how, const sigset_t *set,
                 sigset_t *oldset);
int sigpending(sigset_t *set);
int sigsuspend(const sigset_t *mask);;

Return

Success

Failure

Sets errno

0

−1

Yes

Each function returns a 0 if it is successful; otherwise, it returns a −1 and sets the value in errno (Table 4.24).

Table 4.24. sigprocmask, sigpending, and sigsuspend Error Messages.

#

Constant

perror Message

Explanation

4

EINTR

Interrupted system call

A signal was caught during the system call.

14

EFAULT

Bad address

set or oldset references an invalid address space.

The process's signal mask can be manipulated with the sigprocmask system call. The first argument, how, indicates how the list of signals (referenced by the second argument, set) should be treated. The action that sigprocmask will take, based on the value of how, is summarized in Table 4.25.

Table 4.25. Defined how Constants.

Signal

Action

SIG_BLOCK

Block the signals specified by the union of the current set of signals with those specified by the set argument.

SIG_UNBLOCK

Unblock the signals specified by the set argument.

SIG_SETMASK

Block just those signals specified by the set argument.

If the third argument, oldset, is non-null, the previous value of the signal mask is stored in the location referenced by oldset.

The use of the sigprocmask system call is shown in Program 4.7.

Example 4.7. Using sigprocmask.

File : p4.7.cxx
  |     /* Demonstration of the sigprocmask call */
  |     #define_GNU_SOURCE
  |     #include <iostream>
  |     #include <cstdio>
  +     #include <signal.h>
  |     #include <unistd.h>
  |     using namespace std;
  |     sigset_t new_signals;
  |     int
 10     main( ) {
  |       void    signal_catcher(int);        <-- 1
  |       struct  sigaction new_action;        <-- 2
  |
  |       sigemptyset(&new_signals);
  +       sigaddset(&new_signals,SIGUSR1);
  |
  |       sigprocmask(SIG_BLOCK, &new_signals, NULL);
  |       new_action.sa_handler = signal_catcher;
  |       new_action.sa_flags   = 0;
 20       if (sigaction(SIGUSR2, &new_action, NULL) == -1) {
  |         perror("SIGUSR2");
  |         return 1;
  |       }
  |       cout << "Waiting for signal" << endl;
  +       pause( );
  |       cout << "Done" << endl;
  |       return 0;
  |     }
  |     void
 30     signal_catcher( int n ) {
  |       cout << "Received signal " << n << " will release SIGUSR1" << endl;
  |       sigprocmask(SIG_UNBLOCK, &new_signals, NULL);
  |       cout << "SIGUSR1 released!" << endl;
  |     }
  • (1)Empty (clear) the set of signals.

  • (2)Add the SIGUSR1 signal to this set.

The example makes use of the SIGUSR1 and SIGUSR2 signals. These are two user-defined signals whose default action is termination of the process. In lines 14 and 15 of the example are two signal-mask manipulation library functions (sigemptyset and sigaddset) that are used to clear and then add a signal to the new signal mask. A signal mask is essentially a string of bits—each set bit represents a signal. The signal-mask manipulation library functions are covered in detail in Chapter 11, “Threads.” In Program 4.7, the sigprocmask system call in line 17 holds (blocks) incoming SIGUSR1 signals. The sigaction system call (line 20) is used to associate the receipt of SIGUSR2 with the signal-catching routine. Following this, an informational message is displayed, and a call to pause is made. In the program function signal_catcher, the sigprocmask system call is used to release the pending SIGUSR1 signal. Notice that a cout statement was placed before and after the sigprocmask call. A sample of this program run locally is shown in Figure 4.11.

When run, the program is placed in background so the user can continue to issue commands from the keyboard. The system displays the job number for the process and the PID. The program begins by displaying the Waiting for signal message. The user, via the kill command, sends the process a SIGUSR1 signal. This signal, while received by the process, is not acted upon, as the process has been directed to block this signal. When the SIGUSR2 signal is sent to the process, the process catches the signal, and the program function signal_catcher is called. The initial cout statement in the signal-catching routine is executed, and its message about receiving signals is displayed. The following sigprocmask call then unblocks the pending SIGUSR1 signal that was issued earlier. As the default action for SIGUSR1 is termination, the process terminates and the system produces the trailing information indicating the process was terminated via user signal 1. As the process terminates abnormally, the second cout statement in the signal-catching routine and the cout in the main of the program are not executed.

Example 4.11. Output of Program 4.7.

linux$ ./p4.7 &
Waiting for signal
[1] 21895
linux$ kill -USR1 21895        <-- 1
linux$ kill -USR2 21895        <-- 2
Received signal 12 will release SIGUSR1
linux$
[1]    User signal 1                 ./p4.7
  • (1)SIGUSR1 would normally cause the process to exit—but it has been blocked.

  • (2)SIGUSR2 has been mapped to the signal-catching routine. In this routine, SIGUSR1 is unblocked; consequently, the process exits without executing the second cout statement in the signal catcher.

The sigsuspend system call is used to pause (suspend) a process. It replaces the current signal mask with the one passed as an argument. The process suspends until a signal is delivered whose action is to execute a signal-catching function or terminate the process. Program 4.8 demonstrates the use of the sigsuspend system call.

Example 4.8. Using sigsuspend.

File : p4.8.cxx
  |     /* Pausing with sigsuspend */
  |     #define_GNU_SOURCE
  |     #include <iostream>
  |     #include <cstdio>
  +     #include <signal.h>
  |     #include <unistd.h>
  |     using namespace std;
  |     int
  |     main( ){
 10       void      signal_catcher(int);
  |       struct    sigaction new_action;
  |       sigset_t  no_sigs, blocked_sigs, all_sigs;
  |
  |       sigfillset ( &all_sigs     );        // turn all bits on
  +       sigemptyset( &no_sigs      );        // turn all bits off
  |       sigemptyset( &blocked_sigs );
  |                                            // Associate with catcher
  |       new_action.sa_handler = signal_catcher;
  |       new_action.sa_mask    = all_sigs;
 20       new_action.sa_flags   = 0;
  |       if (sigaction(SIGUSR1, &new_action, NULL) == -1) {
  |         perror("SIGUSR1");
  |         return 1;
  |       }
  +       sigaddset( &blocked_sigs, SIGUSR1 );
  |       sigprocmask( SIG_SETMASK, &blocked_sigs, NULL);
  |       while ( 1 ) {
  |         cout << "Waiting for SIGUSR1 signal" << endl;
  |         sigsuspend( &no_sigs );           // Wait
 30       }
  |       cout << "Done." << endl;
  |       return 0;
  |     }
  |     void
  +     signal_catcher(int n){
  |       cout << "Beginning important stuff" << endl;
  |       sleep(10);                           // Simulate work ....
  |       cout << "Ending important stuff" << endl;
  |     }

In main, the signal-catching function is established. Lines 14 to 16 create three signal masks. The sigfillset call turns all bits on, while the sigemptyset turns all bits off. The filled set (all bits on, denoting all signals) becomes the signal mask for the signal-catching routine. Thus specified, this directs the signal-catching routine to block all signals. In line 21 the receipt of signal SIGUSR1 is associated with the signal-catching function signal_catcher. In lines 25 and 26 the process is directed to block any SIGUSR1 signals. While at first glance this might seem superfluous, as receipt of this signal has been mapped to signal_catcher, it allows duplicate SIGUSR1 signals to be pending rather than discarded. Then, in an endless loop, the program pauses when the sigsuspend statement is reached, waiting for the receipt of the SIGUSR1 signal. Once the SIGUSR1 signal is received (caught), the signal-catching function is executed. While in the signal-catching function, all signals that can be blocked are held. A set of messages indicating the beginning and end of an important section of code are displayed. When the signal-catching routine is exited, any blocked signals are released. In summary, the program defers the execution of an interrupt-protected section of code until it receives a SIGUSR1 signal. A run of the program produces the output shown in Figure 4.12.

Example 4.12. Output of Program 4.8.

linux$ p4.8 &
Waiting for SIGUSR1 signal
[1] 6277
linux$ kill -USR1 %1
Beginning important stuff
linux$ kill -INT %1
linux$ jobs
[1]  + Running                       p4.8
linux$ Ending important stuff
[1]    Interrupt                     p4.8

The process was first sent a SIGUSR1 signal that caused it to begin the program function signal_catcher. While it was in the signal_catcher function, an interrupt signal was sent to the process. This signal did not cause the process to immediately terminate, as the process had indicated that all signals were to be blocked (held). The jobs command confirms that the process is still active after the interrupt command was sent. However, once the blocked signals are released (when the signal-catching routine is exited), the pending SIGINT signal is acted upon and the process terminates.

Summary

As we have seen, lock files, the locking of files and signals, can be used as a basic means of communication between processes. Lock files require the participating processes to agree upon file names and locations. The creation of a lock file carries with it a certain amount of system overhead characteristic of all file manipulations. In addition, the problems associated with the removal of “leftover” invalid lock files and the implementation of nonsystem-intensive polling techniques must be addressed. On the positive side, lock file techniques can be used in any UNIX environment that supports the creat system call, and cooperating processes do not need to be related.

UNIX has predefined routines that can be used to lock a file. We can use the presence of a lock on a file to indicate that a resource is unavailable. Advisory locking is less system-intensive than mandatory locking and is thus more common. As with lock files, the participating processes using advisory locking must cooperate to effectively communicate.

Signals provide us with another basic communication technique. While signals do not carry any information content, they can be, as we have seen, used to communicate from one process to another. From a system implementation standpoint, signals are more efficient than using lock files. However, participating processes must have access to each other's PIDs (in most cases the processes will be parent/child pairs). In most environments, the number of user-designated signals is limited. Cooperating processes must agree upon the “meaning” of each signal. When a signal is sent from one process to another, unless the receiving process acknowledges the receipt of the signal, there is no way for the sending process to know if its initial signal was received. Signal manipulation can be tricky, and its implementation from one version of UNIX to another may vary (this is one of the last areas of UNIX to be standardized). All of these techniques are easy to understand and to implement but are often difficult to implement well. However, all approaches have a number of limitations that remove them from serious consideration when reliable communication between processes is needed.

Key Terms and Concepts

aborting a process

advisory locking

alarm system call

asynchronous

atomic

consumer process

core image

creat system call

fcntl system call

file locking

flock structure

ignoring a signal

interrupt

isatty library function

kill command

kill system call

link system call

lock file

lockf library call

mandatory locking

nohup command

pause library function

polling

producer process

race condition

raise library function

real-time signals

shlock command

sigaction structure

sigaction system call

signal blocking

signal catcher

signal delivery

signal generation

signal system call

signals

sigpending system call

sigprocmask system call

sigsuspend system call

sleep library function

stopping a process

unlink system call



[1] As the superuser has special privileges, the lock file implementation shown here would not work for the superuser.

[2] At one time the open system call did not support the O_CREAT (create) option.

[3] EACCES is a defined constant found in the <sys/errno.h> header file.

[4] If smaller intervals are needed, there is a usleep (unsigned sleep) library function that suspends execution of the calling process for a specified number of microseconds.

[5] ANSI C also defines a raise library function that can be used by a process to send itself a signal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.208.206