Now that we have covered the basics of process structure and generation, we can begin to address the topic of interprocess communications. It is common for processes to need to coordinate their activities (e.g., such as when accessing a non-shareable system resource). Conceptually, this coordination is implemented via some form of passive or active communication between processes. As we will see, there are a number of ways in which interprocess communications can be carried out. The remaining chapters address a variety of interprocess communication techniques. As the techniques become more sophisticated, they become more complex, and hopefully more flexible and reliable. We begin by discussing primitive communication techniques that, while they get the job done, have certain limitations.
A lock file (which should not be confused with file/record locking, an I/O technique covered in Section 4.3) can be used by processes as a way to communicate with one another. The processes involved may be different programs or multiple instances of the same program. The use of lock files has a long history in UNIX. Early versions of UNIX (as well as some current versions) use lock files as a means of communication. Lock files are sometimes found in line printer and uucp
implementations. In some systems the coordination of access to password and mail files also rely on lock files and/or the locking of a specific file.
The theory behind the use of a lock file as an interprocess communication technique is rudimentary. In brief, by using an agreed-upon file-naming convention, a process examines a prearranged location for the presence or absence of a lock file. Often the location is a temporary directory (e.g., /tmp
) where the files are automatically cleared when the system reboots (or by periodic housecleaning by the system administrator) and where all users normally have read/write/execute permission. In its most basic form, if the file is present, the process takes one set of actions, and if the file is missing, it takes another. For example, suppose we have two processes, Process_One and Process_Two, that seek access to a single non-shareable resource (e.g., a printer or disk). A lock file-based communication convention for the two processes could be as shown in Figure 4.1.
It is clear that communication implemented in this manner only conveys a minimal amount of information from one process to another. In essence, the processes are using the presence or absence of the lock file as a binary semaphore. The file's presence or absence communicates, from one process to another, the availability of a resource.
Such a communication technique is fraught with problems. The most apparent problem is that the processes must agree upon the naming convention for the lock file. However, additional, perhaps unforeseen, problems may arise as well. For example,
What if one of the processes fails to remove the lock file when it is finished with the resource?
Polling (the constant checking to determine if a certain event has occurred) is expensive (CPU-wise) and is to be avoided. How does the process that does not obtain access to the resource wait for the resource to become free?
Race conditions whereby both processes find the lock file absent at the same time and, thus, both attempt to simultaneously create it should not happen. Can we make the generation of the lock file atomic (non-divisible, i.e., non-interruptible)?
As we will see, we will be able to address some of these concerns and others we will only be able to limit in scope. A program that implements communications using a lock file is presented below. The code for the main portion of the program is shown in Program 4.1.
Example 4.1. Using a lock file—the main program.
File : p4.1.cxx | /* | Using a lock file as a process communication technique. | */ | #include <iostream> + #include <unistd.h> | | #include "lock_file.h" <-- 1 | using namespace std; | int 10 main(int argc, char *argv[ ]){ | int numb_tries, i = 5; | int sleep_time; | char *fname; | /* + Assign values from the command line | */ | set_defaults(argc, argv, &numb_tries, &sleep_time, &fname); | /* | Attempt to obtain lock file 20 */ | if (acquire(numb_tries, sleep_time, fname)) { | while (i--) { // simulate resource use | cout << getpid( )<< " " << i << endl; | sleep(sleep_time); + } | release(fname); // remove lock file | return 0; | } else | cerr << getpid( ) << " unable to obtain lock file after " 30 << numb_tries << " tries." << endl; | return 1; | }
At line 7 of the program, the local header file lock_file.h
is included. This file (Figure 4.2) contains the prototypes for the three functions set_defaults
, acquire
, and release
, that are used to manipulate the lock file. Preprocessor statements are used in the header file to prevent the file from being inadvertently included more than once.
In line 17 of the main program the set_defaults
function is called to establish the default values. Once these values have been assigned, the program attempts to obtain the lock file by calling the function acquire
(line 21). If the program is successful in creating the lock file, it then accesses the non-shareable resource. In the case of Program 4.1 the resource involved is the screen. When access to the screen is acquired, the program displays a series of integer values. Once the program is finished with the resource (all values have been displayed), the lock file is removed using the release
function.
Example 4.2. The lock_file.h
header file.
File : lock_file.h | #ifndef LOCK_FILE_H | #define LOCK_FILE_H | /* | Lock file function prototypes + */ | void set_defaults(int, char *[], int *, int *, char **); | bool acquire(int, int, char *); | bool release(char *); | #endif
The set_defaults
function accepts five arguments. The first two arguments (an integer and an array of character pointers) are the argc
and argv
values passed to the main program (Program 4.1). As written, the program will allow the user to change some or all of the default values by passing alternate values on the command line when the program is invoked. The remaining three arguments for set_defaults
are the number of tries to be made when attempting to generate the lock file, the amount of time to wait in seconds between attempts, and a reference to the name of the lock file.
The acquire
function takes three arguments. The first is the number of times to attempt to create the lock file, the second the sleep interval between tries, and the third a reference to the lock file name. The acquire
function returns a boolean value indicating its success.
The function release
removes the lock file. This function is passed a reference to the lock file and returns a boolean value indicating whether or not it was successful. The code for these functions, which are stored in a separate file, is shown in Figure 4.3.
Example 4.3. Source code for the set_defaults
, acquire
, and release
functions.
File : lock_file.cxx
| /*
| Source code for using lock file. Compile using -c and
| -D_GNU_SOURCE options. Link object code as needed.
| */
+ #include <iostream>
| #include <cstring>
| #include <cstdlib>
| #include <cerrno>
| #include <limits.h>
10 #include <fcntl.h>
| #include <unistd.h>
| const int NTRIES = 5; // default values
| const int SLEEP = 5;
| const char *LFILE = "/tmp/TEST.LCK";
+ using namespace std;
| void
| set_defaults(int ac, char *av[ ],
| int *n_tries, int *s_time, char **f_name){
| static char full_name[PATH_MAX];
20 *n_tries = NTRIES; // Start with defaults
| *s_time = SLEEP;
| strcpy(full_name, LFILE);
| switch (ac) {
| case 4: // File name was specified
+ full_name[0] = ' '; // "clear" the string
| strcpy(full_name, av[3]); // Add the passed in file
| case 3:
| if ((*s_time = atoi(av[2])) <= 0) // Seconds of sleep time
| *s_time = SLEEP;
30 case 2:
| if ((*n_tries = atoi(av[1])) <= 0) // Number of times to try
| *n_tries = NTRIES;
| case 1: // Use the defaults
| break;
+ default:
| cerr << "Usage: " << av[0] <<
| " [[tries][sleep][lockfile]]" << endl;
| exit(1);
| }
40 *f_name = full_name;
| }
|
| bool
| acquire(int numb_tries, int sleep_time, char *file_name){
+ int fd, count = 0;
| while ((fd = creat(file_name, 0)) == -1 && errno == EACCES)
| if (++count < numb_tries) // If still more tries
| sleep(sleep_time); // sleep for a while
| else
50 return (false); // Unable to generate
| close(fd); // Close (0 byte in size)
| return (bool(fd != -1)); // OK if actually done
| }
|
+ bool
| release(char *file_name){
| return bool(unlink(file_name) == 0);
| }
At the top of the lock_file.cxx
file, the default values are assigned. The set_defaults
function examines the number of arguments passed on the command line (which has been passed to it as the variable ac
). A cascading switch
statement is used to determine if changes in the default assignments should be made. The set_defaults
function assumes the command-line arguments, if present, are arranged as
linux$ program_name numb_of_tries sec_to_sleep lck_file_name
The value for numb_of_tries
and the sec_to_sleep
should be nonzero. The lck_file_name
is the name to be used for the lock file. As written, the set_defaults
function does not validate the passed-in lock file location/name but does attempt to disallow values of zero or less for the number of tries and the sleep interval.
The function acquire
relies on the system call creat
(note there is no trailing e
) to generate the lock file (Table 4.1).
Table 4.1. Summary of the creat
System Call.
Include File(s) |
<sys/types.h> <sys/stat.h> <fcntl.h> | Manual Section | 2 | |
Summary |
| |||
Return | Success | Failure | Sets | |
Lowest available integer file descriptor | −1 | Yes |
By definition, creat
is used to create a new file or rewrite a file that already exists (first truncating it to 0 bytes). The creat
system call will open a file for writing only.
creat
requires two arguments. The first argument, pathname
, is a character pointer to the file to be created, and the second argument, mode
, is a value of type mode_t
(in most cases defined as type int
in the <sys/types.h>
file), which specifies the mode (access permissions) for the created file. The header file <fcntl.h>
contains a number of predefined constants that may be bitwise OR
ed to specify the mode
for the file. The creat
system call in the program function acquire
creates a file whose access mode is 0. If creat
is successful, the file generated will not have read, write, or execute permission for any user groups (this excludes the superuser root).[1]
An alternate approach to creating the file would be to use the open
[2] system call. The equivalent statement using open
would be:
open( path, O_WRONLY | O_CREAT | O_TRUNC, 0 );
If the creat
call is successful, it will return an integer value that is the lowest available file descriptor. If creat
fails, it returns/sets a −1 and sets errno
. Table 4.2 contains the errors that may be encountered when using the creat
system call.
As shown, a number of things can cause creat
to fail, including too many files open, an incorrectly specified file and/or path name, and so on. The failure we test for in the while
loop of the acquire
function is EACCES.[3] The failure of creat
and the setting of errno
to EACCES indicates the file to be created already exists and write permission to the file is denied (remember, the file was generated with a mode
of 0).
Table 4.2. creat
Error Messages.
# | Constant |
| Explanation |
---|---|---|---|
2 | ENOENT | No such file or directory | One or more parts of the path to new file do not exist (or is NULL). |
6 | ENXIO | No such device or address |
|
12 | ENOMEM | Cannot allocate memory | Insufficient kernel memory was available. |
13 | EACCES | Permission denied |
|
14 | EFAULT | Bad address |
|
17 | EEXIST | File exists |
|
19 | ENODEV | No such device |
|
20 | ENOTDIR | Not a directory | Part of the specified path is not a directory. |
21 | EISDIR | Is a directory |
|
23 | ENFILE | Too many open files in system | System limit on open files has been reached. |
24 | EMFILE | Too many open files | The process has exceeded the maximum number of files open. |
26 | ETXTBSY | Text file busy | More than one process has the executable open for writing. |
28 | ENOSPC | No space left on device | Device for pathname has no space for new file (it is out of inodes). |
30 | EROFS | Read-only file system | The |
36 | ENAMETOOLONG | File name too long | The |
40 | ELOOP | Too many levels of symbolic links | The |
As noted, the while
loop in the acquire
function tests to determine if a file can be created. If the file can be created, the loop is exited and the file descriptor is closed (leaving the file present and 0 bytes in length). When the file cannot be created and the error code in errno
is EACCES, the if
statement in the body of the loop is executed. In the if
statement the value for count
is tested against the designated number of tries for creating the file. If insufficient tries have been made, a call to sleep
, to suspend processing, is made.
sleep
is a library function that suspends the invoking process for the number of seconds indicated by its argument seconds
.[4] See Table 4.3. If sleep
is interrupted (such as by a signal), the number of unslept seconds is returned. If the amount of time slept is equal to the argument value passed, sleep
will return a 0. Using sleep
in the polling loop to have the process wait is a compromise. It is not an elegant way to reduce CPU-intensive code but, at this point, is better than no built-in wait or running some sort of throwaway calculation loop. In later chapters, we discuss alternate solutions to this problem.
Table 4.3. Summary of the sleep
Library Function.
Include Files(s) |
| Manual Section | 3 | |
Summary |
| |||
Return | Success | Failure | Sets | |
Amount of time left to sleep. |
If, in the program function acquire
, the number of tries has been exceeded, a FALSE value, indicating a failure, is returned. A boolean TRUE type value is returned if the while
loop is exited because the creat
call was successful. Additionally, if the creat
fails for any other reason, a FALSE type value is returned.
The release
function attempts to remove the file using the system call unlink
(Table 4.4). This call deletes a file from the filesystem if the reference is the last link to the file and the file not currently in use. If the reference is a symbolic link, the link is removed. In the program the release
function is coded to return the success or failure of unlink
's ability to accomplish its task. As written, the main program discards the value returned by the release
function.
Table 4.4. Summary of the unlink
System Call.
Include Files(s) |
| Manual Section | 2 | |
Summary |
| |||
Return | Success | Failure | Sets | |
0 | −1 | Yes |
If the unlink
system call fails it returns a value of −1 and sets errno
to one of the values found in Table 4.5. If unlink
is successful, it returns a value of 0.
Table 4.5. unlink
error messages.
# | Constant |
| Explanation |
---|---|---|---|
1 | EPERM | Operation not permitted |
|
2 | ENOENT | No such file or directory | One or more parts of |
4 | EINTR | Interrupted system call | A signal was caught during the system call. |
5 | EIO | I/O error | An I/O error has occurred. |
12 | ENOMEM | Cannot allocate memory | Insufficient kernel memory was available. |
13 | EACCES | Permission denied |
|
14 | EFAULT | Bad address |
|
16 | EBUSY | Device or resource busy | The referenced file is busy. |
20 | ENOTDIR | Not a directory | Part of the specified path is not a directory. |
21 | EISDIR | Is a directory |
|
26 | ETXTBSY | Text file busy | More than one process has the executable open for writing. |
30 | EROFS | Read-only file system |
|
36 | ENAMETOOLONG | File name too long |
|
40 | ELOOP | Too many levels of symbolic links | The |
67 | ENOLINK | The link has been severed | The path value references a remote system that is no longer available. |
72 | EMULTIHOP | Multihop attempted | The path value requires multiple hops to remote systems, but file system does not allow it. |
A sample compilation run of the program is shown in Figure 4.4.
Example 4.4. Output of Program 4.1.
The program p4.1
is invoked twice. To allow the two processes to execute concurrently, the program invocations are placed in the background (via the trailing &
). The first process creates the lock file and gains access to the screen. This process is responsible for generating the five values (4, 3, 2, 1, 0) that are displayed on the screen. The second process, after two tries with a two-second interval between tries, exits and produces the message Unable to obtain lock file after 2 tries
. When each process finishes, the operating system displays the exit/return value. The process that was unable to gain access to the resource exits with a value of 1. It is informative to run the program several times using varying settings. When doing so, you should be able to ascertain whether the lock file really does allow rudimentary communication between the processes involved.
Our example uses the creat
system call as the base for its atomic file locking. Unfortunately, creat
may generate race conditions on NFS filesystem (network mounted filesystem). The Linux manual page for creat
recommends using the link
system call as the atomic file locking operation (which it indicates should not cause race conditions in an NFS setting). The link
system call is used to generate a hard link to the lock file, giving it new name. With a hard link, the link and the file being linked must reside on the same filesystem. If the stat
system call for the file returns a link count of two, then the lock has been successfully implemented (acquired). See Exercise 4-1 for more on using link
versus creat
.
A second basic communication technique, similar in spirit to using lock files, can be implemented by using some of the standard file protection routines found in UNIX. UNIX allows the locking of records. As there is no real record structure imposed on a file, a record (which is sometimes called a segment or section) is considered to be a specified number of contiguous bytes of storage starting at an indicated location. If the starting location for the record is the beginning of a file, and the number of bytes equals the number found in the file, then the entire file is considered to be the record in question. Locking routines can be used to impose advisory or mandatory locking. In advisory locking the operating system keeps track of which processes have locked files. The processes that are using these files cooperate and access the record/file only when they determine the lock is in the appropriate state. When advisory locking is used, outlaw processes can still ignore the lock, and if permissions permit, modify the record/file. In mandatory locking the operating system will check lock information with every read
and write
call. The operating system will ensure that the proper lock protocol is being followed. While mandatory locking offers added security, it is at the expense of additional system overhead. Locks become mandatory if the file being locked is a plain file (not executable) and the set-group-ID is on and the group execution bit is off.
At a system level the chmod
command can be used to specify a file support mandatory locking. For example, in Figure 4.5, the permissions on the data file x.dat
are set to support mandatory file locking. The ls
command will display the letter S
in the group execution bit field of a file that supports a mandatory lock. Notice that in the example absolute mode was used with the chmod
command to establish locking. The first digit of the mode value should be a 2 and the third digit a 6, 4, 2, or 0 (but not a 1).
The topic of record locking is expansive. We focus on one small aspect of it. We use file locking routines to place and remove an advisory lock on an entire file as a communication technique with cooperating processes.
There are several ways to set a lock. The two most common approaches are presented: the fcntl
system call and the lockf
library function. We begin with fcntl
(Table 4.6).
Table 4.6. Summary of the fcntl
System Call.
Include File(s) |
<unistd.h> <fcntl.h> | Manual Section | 2 | |
Summary |
int fcntl(int fd, int cmd /* , struct flock *lock */); | |||
Return | Success | Failure | Sets | |
Value returned depends upon the | −1 | Yes |
As its first argument the fcntl
system call is passed a valid integer file descriptor of an open file. The second argument, cmd
, is an integer command value that specifies the action that fcntl
should take. The command values for locking are specified as defined constants in the header file <bits/fcntl.h>
that is included by the <fcntl.h>
header file. The lock specific constants are shown in Table 4.7.
Table 4.7. Lock-Specific Defined Constants Used with the fcntl
System Call.
Defined Constant | Action Taken by |
---|---|
F_SETLK | Set or remove a lock. Specific action is based on the contents of the |
F_SETLKW | Same as F_SETLK, but block (wait) if the indicated record/segment is not available—the default is not to block. |
F_GETLK | Return lock status information via the |
The third argument for fcntl
is optional for some invocations (as indicated by it being gcommented out in the function prototype). However, when working with locks, the third argument is specified and references a flock
structure, which is defined as
struct flock { short int l_type; /* Type of lock: F_RDLCK, F_WRLCK, or F_UNLCK. */ short int l_whence; /* Where 'l_start' is relative to. */ #ifndef __USE_FILE_OFFSET64 __off_t l_start; /* Offset where the lock begins. */ __off_t l_len; /* Size of the locked area; (0 == EOF). */ #else __off64_t l_start; /* For systems with 64 bit offset. */ __off64_t l_len; #endif __pid_t l_pid; /* PID of process holding the lock. */ };
The flock
structure is used to pass information to and return information from the fcntl
call. The type of lock, l_type
, is indicated by using one of the defined constants shown in Table 4.8.
The l_whence
, l_start
, and l_len flock
members are used to indicate the starting location (0, the beginning of the file; 1, the current location; and 2, the end of the file), relative offset, and size of the record (segment). If these values are set to 0, the entire file will be operated upon. The l_pid
member is used to return the PID of the process that placed the lock.
Table 4.8. Defined Constants Used in the flock l_type
Member.
Defined Constant | Lock Specification |
---|---|
F_RDLCK | Read lock |
F_WRLCK | Write lock |
F_UNLCK | Remove lock |
When dealing with locks, if fcntl
fails to carry out an indicated command, it will return a value of −1 and set errno
. Error messages associated with locking are shown in Table 4.9.
Table 4.9. fcntl
Error Messages Relating to Locking.
# | Constant |
| Explanation |
---|---|---|---|
4 | EINTR | Interrupted system call | A signal was caught during the system call. |
9 | EBADF | Bad file number |
|
11 | EAGAIN | Resource temporarily unavailable | Lock operation is prohibited, as the file has been memory mapped by another process. |
13 | EACCES | Permission denied | Lock operation prohibited by a lock held by another process. |
14 | EFAULT | Bad address |
|
22 | EINVAL | Invalid argument |
|
35 | EDEADLK | Resource deadlock avoided |
|
37 | ENOLCK | No locks available | System has reached the maximum number of record locks. |
Program 4.2 demonstrates the use of file locking.
Example 4.2. Using fcntl
to lock a file.
File : p4.2.cxx | /* Locking a file with fcntl | */ | #include <iostream> + #include <cstdio> | #include <cerrno> | #include <fcntl.h> | #include <unistd.h> | using namespace std; | const int MAX = 5; 10 int | main(int argc, char *argv[ ]) { | int f_des, pass = 0; | pid_t pid = getpid(); | struct flock lock; // for fcntl info + if (argc < 2) { // name of file to lock missing | cerr << "Usage " << *argv << " lock_file_name" << endl; | return 1; | } | sleep(1); // don't start immediately 20 if ((f_des = open(argv[1], O_RDWR)) < 0){ | perror(argv[1]); // could not access file | return 2; | } | lock.l_type = F_WRLCK; // set a write lock + lock.l_whence = 0; // start at beginning | lock.l_start = 0; // with a 0 offset | lock.l_len = 0; // whole file | while (fcntl(f_des, F_SETLK, &lock) < 0) { | switch (errno) { 30 case EAGAIN: | case EACCES: | if (++pass < MAX) | sleep(1); | else { // run out of tries + fcntl(f_des, F_GETLK, &lock); | cerr << "Process " << pid << " found file " | << argv[1] << " locked by " << lock.l_pid << endl; | return 3; | } 40 continue; | } | perror("fcntl"); | return 4; | } + cerr << endl << "Process " << pid << " has the file" << endl; | sleep(3); // fake processing | cerr << "Process " << pid << " is done with the file" << endl; | return 0; | }
In this program the name of the file to be locked is passed on the command line. A call to sleep
is placed at the start of the program to slow down the processing (for demonstration purposes only). The designated file is opened for reading and writing. In lines 24 through 27 the lock
structure is assigned values that indicate a write
lock is to be applied to the entire file. In the while
loop that follows, a call to fcntl
requests the lock be placed. If fcntl
fails and errno
is set to either EAGAIN or EACCES (values that indicate the lock could not be applied), the process will sleep
for one second and try to apply the lock again. To be safe, the EACCES constant is grouped with EAGAIN, as in some versions of UNIX this is the value that is returned when a lock cannot be applied. If the MAX number of tries (passes) has been exceeded, another call to fcntl
(line 35) is made to obtain information about the process that has locked the file. In this call the address of the lock structure is passed to fcntl
. The PID of the locking process is displayed, and the program exits. If an error other than EAGAIN or EACCES is encountered when attempting to set the lock, perror
is called, a message is displayed, and the program exits. If the process successfully obtains the lock, the process prints an informational message, sleeps three seconds (to simulate some sort of processing), and prints a second message as it terminates. When the process terminates, the system automatically removes the lock on the file. If the process were not to terminate, the process would need to set the l_type
member to F_UNLCK and reissue the fcntl
call to clear the lock.
If we run three copies of Program 4.2 in rapid succession, using the file x.dat
as the lock file, their output will be similar to that shown in Figure 4.6.
Example 4.6. Running multiple copies of Program 4.2—locking a file.
linux$ p4.2 x.dat & p4.2 x.dat & p4.2 x.dat & <-- 1 [1] 28392 [2] 28393 [3] 28394 $ Process 28392 has the file Process 28392 is done with the file Process 28393 has the file Process 28394 found file x.dat locked by 28393 Process 28393 is done with the file [3] — Exit 3 p4.2 x.dat [2] — Done p4.2 x.dat [1] + Done p4.2 x.dat
Notice that the last process, PID 28394 in this example, is unable to place a lock on the file and returns the process ID of the process that currently has the lock on the file. The second process, PID 28393, through repeated retries (with intervening calls to sleep
) is able to lock the file once the first process is finished with it.
The lockf
library function may also be used to apply, test, or remove a lock on an open file. Beneath the covers this library function is an alternate interface for the fcntl
system call. The lockf
library function is summarized in Table 4.10.
Table 4.10. Summary of the lockf
Library Call
Include File(s) |
<sys/file.h> <unistd.h> | Manual Section | 3 | |
Summary |
| |||
Return | Success | Failure | Sets | |
0 | −1 | Yes |
The fd
argument is a file descriptor of a file that has been opened for either writing (O_WRONLY) or for reading and writing (O_RDWR). The cmd
argument for lockf
is similar to the cmd
argument used with fcntl
. The cmd
value indicates the action to be taken. The action that lockf
will take for each cmd
value (as specified in the include file <unistd.h>
) is summarized in Table 4.11.
Table 4.11. Defined cmd
Constants.
Defined Constant | Lock Specification |
---|---|
F_ULOCK | Unlock a previously locked file. |
F_LOCK | Lock a file (or a section of a file) for exclusive use if it is available. If unavailable, the |
F_TLOCK | Test and, if successful, lock a file (or section of a file) for exclusive use. An error is returned if no lock can be applied; with this option the |
F_TEST | Test a file for the presence of a lock. A 0 is returned if the file is unlocked or locked by the current process. If locked by another process, −1 is returned and |
The len
argument of lockf
indicates the number of contiguous bytes to lock or unlock. A value of zero indicates the section should be from the present location to the end of the file.
If the lockf
call is successful, it returns a value of 0. If the call fails, it sets errno
and returns the value −1 (Table 4.12).
Table 4.12. lockf
error messages.
# | Constant |
| Explanation |
---|---|---|---|
9 | EBADF | Bad file number |
|
11 | EAGAIN | Resource temporarily unavailable |
|
13 | EACCES | Permission denied | Lock operation prohibited by a lock held by another process. |
22 | EINVAL | Invalid argument | Invalid operation specified for |
35 | EDEADLK | File locking deadlock | Requested lock operation would cause a deadlock. |
37 | ENLOCK | No locks available | Maximum number of system locks has been reached. |
Of the two techniques, lockf
is simpler but less flexible than using fcntl
. Note that when using the lockf
call, the user must issue a separate lseek
system call to position the file pointer to the proper location in the file prior to the call. Also, when generating parent/child process pairs, each shares the same file pointer. If locks are to be used in both processes, it is sometimes best to close and reopen the file in question so that each process has its own separate file pointer.
A final note—Linux supports a shlock
command that can be used in shell scripts. The shlock
command creates a lock file that contains an identifying PID.
A second primitive interprocess communication technique involves the use of signals. As previously indicated, signals occur asynchronously (with no specified timing or sequencing). Signals, which can be sent by processes or by the kernel, serve as notification that an event has occurred. Signals are generated when the event first occurs and are considered to be delivered when the process takes action on the signal. The delivery of most signals can be blocked so the signal can be acted upon later. Blocked signals, and those sent to processes in a non-running state are commonly called pending signals.
The symbolic name for each signal can be found in several places. Usually, the manual pages for signal
( try man 7 signal
) or the header file <asm/signal.h>
will contain a list of each signal name. Signals, as described in Section 7 of the manual, are shown in Table 4.13. The definition of a signal (its symbolic name, the associated integer value, and the event signaled) has evolved over time. Signals defined by the POSIX 1 standard have the letter P in the Def column; those defined by SUS v2 (Single UNIX Specification, version 2) have a letter S. The letter O indicates signals not defined by either of these standards. Furthermore, keep in mind that some signals are architecture-dependent. To denote this if three numbers are listed in the Value column for a signal, the first number is the signal for alpha and sparc platforms; the middle number is for i386 and ppc platforms; while the last number is for mips platforms. A dash (–
) indicates the signal is missing for the platform. A single value indicates all platforms use the same signal number. The default action associated with the signal is defined by one or more letters in the Action column of the table. The letter A indicates the recipient process will terminate; B, the process will ignore the signal; C, the process will terminate and produce a core file; and D, the process will stop (suspend) execution. Additionally, the letter E indicates the signal cannot be caught (trapped), and the letter F, that the signal cannot be ignored.
Table 4.13. Signal Definitions.
Symbolic Name | Def | Value | Action | Description |
---|---|---|---|---|
|
|
|
| Abort signal from |
|
|
|
| Timer signal from |
|
|
|
| Bus error (bad memory access). |
|
|
|
| Sent to parent when child is stopped or terminated. |
|
|
|
| A synonym for SIGCHLD. |
|
|
|
| Resume if process is stopped. |
|
|
|
| Emulation trap. |
|
|
|
| Floating-point exception. |
|
|
|
| A hangup was detected on the controlling terminal or the controlling process has died. |
|
|
|
| Illegal instruction. |
|
|
| A synonym for SIGPWR. | |
|
|
|
| Interrupt from keyboard. |
|
|
|
| I/O now possible. |
|
|
|
| IOT trap—equivalent to SIGABRT. |
|
|
|
| Kill signal—force process termination. |
|
|
|
| File lock lost. |
|
|
|
| Broken pipe; write to pipe with no readers. |
|
|
|
| A pollable event has occurred—synonymous with SIGIO (also 23). |
|
|
|
| Profiling timer expired. |
|
|
|
| Power supply failure. |
|
|
|
| Quit from keyboard. |
|
|
|
| Invalid memory reference (segmentation violation). |
|
|
|
| Coprocessor stack error. |
|
|
|
| Stop process—not from tty. |
|
|
|
| Bad argument to system call. |
|
|
|
| Termination signal from |
|
|
|
| Trace/breakpoint trap for debugging. |
|
|
|
| Stop typed at a tty. |
|
|
|
| Background process needs input. |
|
|
|
| Background process needs to output. |
|
|
|
| Unused signal (will be SIGSYS). |
|
|
|
| Urgent condition on I/O channel (socket). |
|
|
|
| User-defined signal 1. |
|
|
|
| User-defined signal 2. |
|
|
|
| Virtual alarm clock. |
|
|
|
| Window resize signal. |
|
|
|
| CPU time limit exceeded. |
|
|
|
| File size limit exceeded. |
Some additional caveats to consider include the following:
For some S signals (SUS v2), the default action is listed as A (terminate) but by their actual action should be C (terminate the process and generate a core file).
Signal 29 is SIGINFO/SIGPWR on an alpha platform but SIGLOST on a sparc platform.
Note that all signals begin with the prefix SIG and end with a semimnemonic suffix. For the sake of portability when referencing signals, it is usually best to use their symbolic names rather than their assigned integer values. The defined constants SIGRTMIN and SIGRTMAX are also found in <asm/signal.h>
and allow the generation of additional real-time signals. Real-time signals, usually the values 32 to 63, can be queued. The queuing of signals ensures that when multiple signals are sent to a process, they will not be lost. At present, the Linux kernel does not make use of real-time signals.
For each signal, a process may take one of the following three actions:
Perform the default action. This is the action that will be taken unless otherwise specified. The default action for each signal is listed in the previous table. Specifically these actions are
Terminate (Abort)—. Perform all the activities associated with the exit
system call.
Core (Dump)—. Produce a core image (file) and then perform termination activities.
Stop—. Suspend processing.
Ignore—. Disregard the signal.
Ignore the signal. If the signal to be ignored is currently blocked, it is discarded. The SIGKILL and SIGSTOP signals cannot be ignored.
Catch the signal. In this case, the process supplies the address of a function (often called a signal catcher) that is to be executed when the signal is received. In most circumstances, the signal catching function will have a single integer parameter. The parameter value, which is assigned by the system, will be the numeric value of the signal caught. When the signal catcher function finishes, the interrupted process will, unless otherwise specified, resume its execution where it left off.
A discussion of the implementation details for ignoring and catching signals are covered in Section 4.5.
Signals are generated in a number of ways:
By the kernel, indicating
Hardware conditions, the most common of which are SIGSEGV, when there has been an addressing violation by the process, and SIGFPE, indicating a division by zero.
Software conditions, such as SIGIO, indicating I/O is possible on a file descriptor or the expiration of a timer.
By the user at a terminal:
Keyboard—. The user produces keyboard sequences that will interrupt or terminate the currently executing process. For example, the interrupt signal, SIGINT, is usually mapped to the key sequence CTRL+C and the terminate signal, SIGQUIT, to the key sequence CTRL+. The command stty -a
will display the current mappings of keystrokes for the interrupt and quit signals.
kill
command—. By using the kill
command, the user, at the command line, can generate any of the previously listed signals for any process that has the same effective ID. The syntax for the kill
command is
$ kill [ -signal ] pid . . .
When issued, the kill
command will send the specified signal to the indicated PID. The signal can be an integer value or one of the symbolic signal names with the SIG prefix removed. If no signal number is given, the default is SIGTERM (terminate). The PID(s) (multiple PIDs are separated with whitespace) are the IDs of the processes that will be sent the signal. If needed, the ps
command can be used to obtain current PIDs for the user.
It is possible for the pid
value to be less than 1 and/or for the signal value to be 0. In these cases, the kill
command will carry out the same actions as specified for the kill
system call described in the following section. As would be expected, the kill
command is just a command-line interface to the kill
system call.
By other processes:
By the kill
system call (Table 4.14). The kill
system call is used to send a signal to a process or a group of processes.
Notice that the argument sequence for the kill
system call is the reverse of that of the kill
command. The value specified for the pid
argument indicates which process or process group will be sent the signal. Table 4.15 summarizes how to specify a process or process group.
Table 4.14. Summary of the kill
System Call.
Include File(s) |
<sys/types.h> <signal.h> | Manual Section | 2 | |
Summary |
| |||
Return | Success | Failure | Sets | |
0 | −1 | Yes |
Table 4.15. Interpretation of pid
values by the kill
System Call.
| Process(es) Receiving the Signal |
---|---|
| The process whose process ID is the same as |
| All the processes in the same process group as the sender |
| Not superuser: All processes whose real ID is the same as the effective ID of the sender Superuser: All processes excluding special processes |
| All the processes whose process group is absolute_value (- |
The value for sig
can be any of the symbolic signal names (or the equivalent integer value) found in the signal header file. If the value of sig
is set to 0, the kill
system call will perform an error check of the specified PID, but will not send the process a signal. Sending a signal of 0 to a PID and checking the return value of the kill
system call is sometimes used as a way of determining if a given PID is present. This technique is not foolproof, as the process may terminate on its own immediately after the call to check on it has been made. Remember that UNIX will reuse PID values once the maximum PID has been assigned. The statement
kill(getpid(),sig);
can be used by a process to send itself the signal specified by sig
.[5]
If the kill
system call is successful, it returns a 0; otherwise, it returns a value of −1 and sets errno
as indicated in Table 4.16. In Linux, for security reasons, it not possible to send a signal to process one—init
. Signals are passed to init
via telinit
.
Table 4.16. kill
Error Messages.
# | Constant |
| Explanation |
---|---|---|---|
1 | EPERM | Operation not permitted |
|
3 | ESRCH | No such process | No such process or process group as |
22 | EINVAL | Invalid argument | Invalid signal number specified. |
By the alarm
system call (Table 4.17).
The alarm
system call sets a timer for the issuing process and generates a SIGALRM signal when the specified number of real-time seconds have passed.
Table 4.17. Summary of the alarm
System Call.
Include File(s) |
| Manual Section | 2 | |
Summary |
| |||
Return | Success | Failure | Sets | |
Amount of time remaining |
If the value passed to alarm
is 0, the timer is reset. Processes generated by a fork
have their alarm values set to 0, while processes created by an exec
inherit the alarm
with its remaining time. alarm
calls cannot be stacked—multiple calls will reset the alarm value. A call to alarm
returns the amount of time remaining on the alarm clock. A “sleep” type arrangement can be implemented for a process using alarm
. However, mixing calls to alarm
and sleep
is not a good idea.
Program 4.3 demonstrates the use of an alarm
system call.
Example 4.3. Setting an alarm
.
File : p4.3.cxx | #include <iostream> | #include <iomanip> | #include <cstdlib> | #include <sys/types.h> + #include <sys/wait.h> | #include <unistd.h> | using namespace std; | int | main(int argc, char *argv[] ) { 10 int w, status; | if ( argc < 4 ) { | cerr << "Usage: " << *argv << " value_1 value_2 value_3 " | << endl; | return 1; + } | for(int i=1; i <= 3; ++i) | if ( fork( ) == 0 ) { | int t = atoi(argv[i]); | cout << "Child " << getpid( ) << " waiting to die in " 20 << t << " seconds." << endl; | alarm( t ); | pause( ); | cout << getpid( ) << " is done." << endl; | } + while (( w=wait(&status)) && w != -1) | cout << "Wait on PID: " << dec << w << " returns status of " | << setw(4) << setfill(48) << hex | << setiosflags(ios::uppercase) << status << endl; | return 0; 30 }
When the program is invoked, three integer values are passed to the program. The parent process generates three child processes using the command-line values to set the alarm
in process. In line 22 the pause
library function is called. This function causes the child process to wait for the receipt of a signal. In the example, this will be the receipt of the SIGALRM signal. When the signal is received, the child process takes the default action for the signal. The default for SIGALRM is for the process to exit and return the value of the signal to its waiting parent. The parent process waits for all of the child processes to finish. As each finishes, the parent displays the child PID and its return status information. It is important to note that the cout
statement in line 23 is never executed, as the child process exits before reaching this statement. This can be verified by the output shown in Figure 4.7.
Example 4.7. Setting an alarm
in multiple child processes.
linux$ p4.3 3 1 5 Child 17243 waiting to die in 3 seconds. Child 17244 waiting to die in 1 seconds. Child 17245 waiting to die in 5 seconds. Wait on PID: 17244 returns status of 000E Wait on PID: 17243 returns status of 000E Wait on PID: 17245 returns status of 000E <-- 1
A call to pause
suspends a process (causing it to sleep) until it receives a signal that has not been ignored (Table 4.18).
Table 4.18. Summary of the pause
Library Function.[6]
Include File(s) |
| Manual Section | 2 | |
Summary |
| |||
Return | Success | Failure | Sets | |
If the signal does not cause termination then –1 returned | Does not return | Yes | ||
[6] While in Section 2 of the manual, the manual page indicates this is a library function. |
pause
returns a −1 if the signal received while pausing does not cause process termination. The value in errno
will be EINTR (4). If the received signal causes termination, pause
will not return (which is to be expected!).
In the previous section we noted that a process can handle a signal by doing nothing (thus allowing the default action to occur), ignoring the signal, or catching the signal. Both the ignoring and catching of a signal entail the association of a signal-catching routine with a signal. In brief, when this is done the process automatically invokes the signal-catching routine when the stipulated signal is received. There are two basic system calls that can be used to modify what a process will do when a signal has been received: signal
and sigaction
. The signal
system call has been present in all versions of UNIX and is now categorized as the ANSI C version signal-handling routine (Table 4.19). The sigaction
system call (Table 4.20) is somewhat more recent and is one of a group of POSIX signal management calls.
Table 4.19. Summary of the signal
System Call.
Include File(s) |
| Manual Section | 2 | |
Summary |
void (*signal(int signum, void (*sighandler)(int)))(int); | |||
Return | Success | Failure | Sets | |
Signal's previous disposition | SIG_ERR (defined as −1) | Yes |
Table 4.20. Summary of the sigaction
System Call.
Include File(s) |
| Manual Section | 2 | |
Summary |
int sigaction(int signum, const struct sigaction *act, struct sigaction *oldact); | |||
Return | Success | Failure | Sets | |
0 | −1 | Yes |
The most difficult part of using signal
is deciphering its prototype. In essence, the prototype declares signal
to be a function that accepts two arguments—an integer signum
value and a pointer to a function—which are called when the signal is received. If the invocation of signal
is successful, it returns a pointer to a function that returns nothing (void
). This is the previous disposition for the signal. The mysterious (
int
)
, found at the far right of the prototype, indicates the referenced function has an integer argument. This argument is automatically filled by the system and contains the signal number. Either system call fails and returns the value −1, setting the value in errno
to EINTR (4), if it is interrupted or to EINVAL (22) if the value given for signum
is not valid or is set to SIGKILL or SIGSTOP. Further, sigaction
returns EFAULT (14) if the act
or oldact
arguments reference an invalid address space.
While both signal
and sigaction
deal with signal handling, the functionality of each is slightly different. Let's begin with the signal
system call.
The first argument to the signal
system call is the signal that we intend to associate with a new action. The signal value can be an integer or a symbolic signal name. This value cannot be SIGKILL or SIGSTOP. The second argument to signal
is the address of the signal-catching function. The signal-catching function can be a user-defined function or one of the defined constants SIG_DFL or SIG_IGN. Specifying SIG_DFL for a signal resets the action to be taken to its default action when the signal is received. Indicating SIG_IGN for a signal means the process will ignore the receipt of the indicated signal.
An examination of the signal header files shows that SIG_DFL and SIG_IGN are defined as integer values that have been appropriately cast to address locations that are invalid (such as −1, etc.). The declaration most commonly found for SIG_DFL and SIG_IGN is shown below. With these definitions is another defined constant that can be used—SIG_ERR. This constant is the value that is returned by signal
if it fails. See Figure 4.8.
Example 4.8. Defined constants used by signal
and sigset
.
/* Fake signal functions. */ #define SIG_ERR ((__sighandler_t) -1) /* Error return. */ #define SIG_DFL ((__sighandler_t) 0) /* Default action. */ #define SIG_IGN ((__sighandler_t) 1) /* Ignore signal. */
Program 4.4 uses the signal
system call to demonstrate how a signal can be ignored.
Example 4.4. Pseudo nohup
—ignoring a signal.
File : p4.4.cxx | /* Using the signal system call to ignore a hangup signal | */ | #include <iostream> + #include <cstdio> | #include <cstdlib> | #include <signal.h> | #include <fcntl.h> | #include <unistd.h> | using namespace std; 10 const char *file_out = "nohup.out"; | int | main(int argc, char *argv[]){ | int new_stdout; | if (argc < 2) { + cerr << "Usage: " << *argv << " command [arguments]" << endl; | return 1; | } | if (isatty( 1 )) { | cerr << "Sending output to " << file_out << endl; 20 close( 1 ); | if ((new_stdout = open(file_out, O_WRONLY | O_CREAT | | O_APPEND, 0644)) == -1) { | perror(file_out); | return 2; + } | } | if (signal(SIGHUP, SIG_IGN) == SIG_ERR) { | perror("SIGHUP"); | return 3; 30 } | ++argv; | execvp(*argv, argv); | perror(*argv); // Should not get here unless | return 4; // the exec call fails. + }
Program 4.4 is a limited version of the /usr/bin/nohup
command found on most UNIX-based systems. The nohup
command can be used to run commands so they will be immune to the receipt of SIGHUP signals. If the standard output for the current process is associated with a terminal, the output from nohup
will be sent to the file nohup.out
. The nohup
command is often used with the command-line background specifier &
to allow a command to continue its execution in the background even after the user has logged out.
Like the real nohup
, our pseudo nohup
program (Program 4.4) will execute the command (with optional arguments) that is passed to it on the command line. After checking the number of command-line arguments, the file descriptor associated with stdout
is evaluated. The assumption here is that the file descriptor associated with stdout
is 1. However, if needed, there is a standard I/O function named fileno
that can be used to find the integer file descriptor for a given argument stream. The library function isatty
(Table 4.21) is used to determine if the descriptor is associated with a terminal device.
Table 4.21. Summary of the isatty
Library Function.
Include File(s) |
| Manual Section | 3 | |
Summary |
| |||
Return | Success | Failure | Sets | |
1 | 0 |
The isattty
library function takes a single integer desc
argument. If desc
is associated with a terminal device, isatty
returns a 1; otherwise, it returns a 0. In the program, if the isatty
function returns a 1, an informational message is displayed to standard error to tell the user where the output from the command passed to the pseudo nohup
program can be found. Next, the file descriptor for stdout
is closed. The open
statement that follows the close returns the first free file descriptor. As we have just closed stdout
, the descriptor returned by the open
will be that of stdout
. Once this reassignment has been done, any information written to stdout
(cout
) by the program will in turn be appended to the file nohup.out
. Notice that the call to signal
to ignore the SIGHUP signal is done within an if
statement. Should the signal
system call fail (return a SIG_ERR), a message would be displayed to standard error and the program would exit. If the signal
call is successful, the argv
pointer is incremented to step past the name of the current program. The remainder of the command line is then passed to the execvp
system call. Should the execvp
call fail, perror
will be invoked and a message displayed. If execvp
is successful, the current process will be overlaid by the program/command passed from the command line.
The output in Figure 4.9 shows what happens when the pseudo nohup
program is run on a local system and passed a command that takes a long time to execute. In the example the long-running command is a small Korn shell script called count
that counts from 1 to 100, sleeping one second after the display of each value. As written, the output from the script would normally be displayed on the screen.
Example 4.9. Output of Program 4.4 when passed a command that takes a long time to execute.
linux$ cat count #! /bin/ksh c=1 while (( $c <= 100 )) <-- 1 do echo "$c" sleep 1 (( c = c + 1 )) done linux$ ./p4.4 ./count & <-- 2 Sending output to nohup.out [1] 19481 linux$ jobs <-- 3 [1] + Running p4.4 count linux$ kill -HUP %1 <-- 4 linux$ jobs [1] + Running p4.4 count linux$ kill -KILL %1 linux$ [1] Killed p4.4 count linux$ jobs linux$
(1)The script count
from 1 to 100, sleeping one second in between the display of each number. If run on the command line, it will take approximately 100 seconds to count from 1 to 100.
(2)Pass the count
script to our pseudo nohup
program—place it in the background.
(3)The operating system returns the PID of the background process.
(4)Sending a hangup signal to the process does not cause it to terminate.
When the program was placed in the background, the system reported the job number (in this case [1
]) and the PID (19481). The jobs
command confirms that the process is still running. As can be seen, the kill
-HUP %1
command (which sends a hangup signal to the first job in the background) did not cause the program to terminate. This is not unexpected, as the SIGHUP signal was being ignored. The command kill
–KILL %1
was used to terminate the process by sending it a SIGKILL signal.
As noted, if a signal-catching function name is supplied to the signal
system call, the process will automatically call this function when the process receives the signal. However, prior to calling the function, if the signal is not SIG KILL, SIGPWR, or SIGTRAP, the system will reset the signal's disposition to its default. This means that if two of the same signals are received successively, it is entirely possible that before the signal-catching routine is executed, the second signal may cause the process to terminate (if that is the default action for the signal). This behavior reduces the reliability of using signals as a communication device. It is possible to reduce, but not entirely eliminate, this window of opportunity for failure by resetting the disposition for the signal in the catching routine. Program 4.5 catches signals and attempts to reduce this window of opportunity.
Example 4.5. Catching SIGINT and SIGQUIT signals.
File : p4.5.cxx | /* Catching a signal | */ | #include <iostream> + #include <cstdlib> | #include <cstdio> | #include <signal.h> | #include <unistd.h> | using namespace std; | int 10 main( ) { | void signal_catcher(int); | if (signal(SIGINT , signal_catcher) == SIG_ERR) { | perror("SIGINT"); | return 1; + } | if (signal(SIGQUIT , signal_catcher) == SIG_ERR) { | perror("SIGQUIT"); | return 2; | } 20 for (int i=0; ; ++i) { // Forever ... | cout << i << endl; // display a number | sleep(1); | } | return 0; + } | void | signal_catcher(int the_sig){ | signal(the_sig, signal_catcher); // reset immediately | cout << endl << "Signal " << the_sig << " received." << endl; 30 if (the_sig == SIGQUIT) | exit(3); | }
In an attempt to avoid taking the default action (which in this case is to terminate) for either of the two caught signals, the first statement (line 28) in the program function signal_catcher
is a call to signal
. This call reestablishes the association between the signal being caught and the signal-catching routine.
Figure 4.10 shows the output of the program when run on a local system.
Example 4.10. Output of Program 4.5.
From this output we can see that each time CTRL+C was pressed, it was echoed back to the terminal as ò
. If CTRL+C was struck twice in quick succession, the program responded with the Signal 2 received
message for each keyboard sequence. On this system it appears as if some of the signals were queued if they were received in rapid succession. However, this is somewhat misleading, as the mechanics of terminal I/O come into play. Say we were (via a background process) to deliver to the process, in very rapid succession, multiple copies of the same signal. In this setting we would find most often that only one copy of the signal would be delivered to the process, while the others are discarded. Most systems do not queue the signals 1 through 31. When a SIGQUIT signal was generated, a message was displayed and the program exited.
The sigaction
system call, like the signal
system call, can be used to associate an alternate action with the receipt of a signal. This system call has three arguments. The first is an integer value that specifies the signal. As with the signal
system call, this argument can be any valid signal except SIGKILL or SIGSTOP. The second and third arguments are references to a sigaction
structure. Respectively these structures store the new and previous action for the signal. The full definition of the sigaction
structure is found in the file sigaction.h
. This file is automatically included by signal.h.
Basically, the sigaction
structure is
struct sigaction { void (*sa_handler)(int); // 1 void (*sa_sigaction)(int, siginfo_t *, void *); // 2 sigset_t sa_mask; // 3 int sa_flags; // 4 void (*sa_restorer)(void); // 5 }
Both sa_handler
and sa_sigaction
can be used to reference a signal handling function. Only one of these should be specified at any given time, as on most systems this data is often stored in a union
within the sigaction
structure. By definition, a union can hold only one of its members at a time. Our discussion centers on using the sa_handler
member. The sa_mask
member specifies the signals, which should be blocked when the signal handler is executing. Each signal is represented by a bit. If the bit in the mask is on, the signal is blocked. By default the signal that triggered the handler is blocked. The sa_flags
member is used to set flags that modify the behavior of the signal-handling process. Flag constants, shown in Table 4.22, can be combined using a bitwise OR
.
Table 4.22. sa_flags
Constants.
Flag | Action |
---|---|
SA_NOCLDSTOP | If the signal is SIGCHILD, then the calling process will not receive a SIGCHILD signal when its child processes exit. |
SA_ONESHOT or SA_RESETHAND | Restore the default action after the signal handler has been called once (similar to the default of the |
SA_RESTART | Use BSD signal semantics (certain interrupted system calls are restarted after the signal has been caught). |
SA_NOMASK or SA_NODEFER | Undo the default whereby the signal triggering the handler is automatically blocked. |
SA_SIGINFO | The signal handler has three arguments—use |
The remaining structure member, sa_restorer
, is obsolete and should not be used.
Unlike signal
, a sigaction
installed signal-catching routine remains installed even after it has been invoked. Program 4.6, which is similar to Program 4.5, shows the use of the sigaction
system call.
Again, notice that in the program function signal_catcher
, it is no longer necessary to reset the association for the signal caught to the signal-catching routine.
Example 4.6. Using the sigaction
system call.
File : p4.6.cxx | /* Catching a signal using sigaction | */ | #define_GNU_SOURCE + #include <iostream> | #include <cstdlib> | #include <cstdio> | #include <signal.h> | #include <unistd.h> | using namespace std; 10 int | main( ) { | void signal_catcher(int); | struct sigaction new_action; <-- 1 | new_action.sa_handler = signal_catcher; + new_action.sa_flags = 0; <-- 2 | | if (sigaction(SIGINT, &new_action, NULL) == -1) { | perror("SIGINT"); <-- 3 | return 1; | 20 } <-- 3 | if (sigaction(SIGQUIT, &new_action, NULL) == -1) { | perror("SIGQUIT"); | return 2; | } + for (int i=0; ; ++i) { // Forever ... | cout << i << endl; // display a number | sleep(1); | } | return 0; 30 } | void | signal_catcher(int the_sig){ | cout << endl << "Signal " << the_sig << " received." << endl; | if (the_sig == SIGQUIT) + exit(3); | }
Three other POSIX signal-related system calls that can be used for signal management are shown in Table 4.23.
Table 4.23. Summary of the sigprocmask, sigpending,
and sigsuspend
System Call.
Include File(s) |
| Manual Section | 2 | |
Summary |
int sigprocmask (int how, const sigset_t *set, sigset_t *oldset); int sigpending(sigset_t *set); int sigsuspend(const sigset_t *mask);; | |||
Return | Success | Failure | Sets | |
0 | −1 | Yes |
Each function returns a 0 if it is successful; otherwise, it returns a −1 and sets the value in errno
(Table 4.24).
Table 4.24. sigprocmask
, sigpending
, and sigsuspend
Error Messages.
# | Constant |
| Explanation |
---|---|---|---|
4 | EINTR | Interrupted system call | A signal was caught during the system call. |
14 | EFAULT | Bad address |
|
The process's signal mask can be manipulated with the sigprocmask
system call. The first argument, how
, indicates how the list of signals (referenced by the second argument, set
) should be treated. The action that sigprocmask
will take, based on the value of how
, is summarized in Table 4.25.
Table 4.25. Defined how
Constants.
Signal | Action |
---|---|
SIG_BLOCK | Block the signals specified by the union of the current set of signals with those specified by the |
SIG_UNBLOCK | Unblock the signals specified by the |
SIG_SETMASK | Block just those signals specified by the |
If the third argument, oldset
, is non-null, the previous value of the signal mask is stored in the location referenced by oldset
.
The use of the sigprocmask
system call is shown in Program 4.7.
Example 4.7. Using sigprocmask
.
File : p4.7.cxx | /* Demonstration of the sigprocmask call */ | #define_GNU_SOURCE | #include <iostream> | #include <cstdio> + #include <signal.h> | #include <unistd.h> | using namespace std; | sigset_t new_signals; | int 10 main( ) { | void signal_catcher(int); <-- 1 | struct sigaction new_action; <-- 2 | | sigemptyset(&new_signals); + sigaddset(&new_signals,SIGUSR1); | | sigprocmask(SIG_BLOCK, &new_signals, NULL); | new_action.sa_handler = signal_catcher; | new_action.sa_flags = 0; 20 if (sigaction(SIGUSR2, &new_action, NULL) == -1) { | perror("SIGUSR2"); | return 1; | } | cout << "Waiting for signal" << endl; + pause( ); | cout << "Done" << endl; | return 0; | } | void 30 signal_catcher( int n ) { | cout << "Received signal " << n << " will release SIGUSR1" << endl; | sigprocmask(SIG_UNBLOCK, &new_signals, NULL); | cout << "SIGUSR1 released!" << endl; | }
The example makes use of the SIGUSR1 and SIGUSR2 signals. These are two user-defined signals whose default action is termination of the process. In lines 14 and 15 of the example are two signal-mask manipulation library functions (sigemptyset
and sigaddset
)
that are used to clear and then add a signal to the new signal mask. A signal mask is essentially a string of bits—each set bit represents a signal. The signal-mask manipulation library functions are covered in detail in Chapter 11, “Threads.” In Program 4.7, the sigprocmask
system call in line 17 holds (blocks) incoming SIGUSR1 signals. The sigaction
system call (line 20) is used to associate the receipt of SIGUSR2 with the signal-catching routine. Following this, an informational message is displayed, and a call to pause
is made. In the program function signal_catcher
, the sigprocmask
system call is used to release the pending SIGUSR1 signal. Notice that a cout
statement was placed before and after the sigprocmask
call. A sample of this program run locally is shown in Figure 4.11.
When run, the program is placed in background so the user can continue to issue commands from the keyboard. The system displays the job number for the process and the PID. The program begins by displaying the Waiting for signal
message. The user, via the kill
command, sends the process a SIGUSR1 signal. This signal, while received by the process, is not acted upon, as the process has been directed to block this signal. When the SIGUSR2 signal is sent to the process, the process catches the signal, and the program function signal_catcher
is called. The initial cout
statement in the signal-catching routine is executed, and its message about receiving signals is displayed. The following sigprocmask
call then unblocks the pending SIGUSR1 signal that was issued earlier. As the default action for SIGUSR1 is termination, the process terminates and the system produces the trailing information indicating the process was terminated via user signal 1. As the process terminates abnormally, the second cout
statement in the signal-catching routine and the cout
in the main of the program are not executed.
Example 4.11. Output of Program 4.7.
The sigsuspend
system call is used to pause (suspend) a process. It replaces the current signal mask with the one passed as an argument. The process suspends until a signal is delivered whose action is to execute a signal-catching function or terminate the process. Program 4.8 demonstrates the use of the sigsuspend
system call.
Example 4.8. Using sigsuspend
.
File : p4.8.cxx | /* Pausing with sigsuspend */ | #define_GNU_SOURCE | #include <iostream> | #include <cstdio> + #include <signal.h> | #include <unistd.h> | using namespace std; | int | main( ){ 10 void signal_catcher(int); | struct sigaction new_action; | sigset_t no_sigs, blocked_sigs, all_sigs; | | sigfillset ( &all_sigs ); // turn all bits on + sigemptyset( &no_sigs ); // turn all bits off | sigemptyset( &blocked_sigs ); | // Associate with catcher | new_action.sa_handler = signal_catcher; | new_action.sa_mask = all_sigs; 20 new_action.sa_flags = 0; | if (sigaction(SIGUSR1, &new_action, NULL) == -1) { | perror("SIGUSR1"); | return 1; | } + sigaddset( &blocked_sigs, SIGUSR1 ); | sigprocmask( SIG_SETMASK, &blocked_sigs, NULL); | while ( 1 ) { | cout << "Waiting for SIGUSR1 signal" << endl; | sigsuspend( &no_sigs ); // Wait 30 } | cout << "Done." << endl; | return 0; | } | void + signal_catcher(int n){ | cout << "Beginning important stuff" << endl; | sleep(10); // Simulate work .... | cout << "Ending important stuff" << endl; | }
In main
, the signal-catching function is established. Lines 14 to 16 create three signal masks. The sigfillset
call turns all bits on, while the sigemptyset
turns all bits off. The filled set (all bits on, denoting all signals) becomes the signal mask for the signal-catching routine. Thus specified, this directs the signal-catching routine to block all signals. In line 21 the receipt of signal SIGUSR1 is associated with the signal-catching function signal_catcher
. In lines 25 and 26 the process is directed to block any SIGUSR1 signals. While at first glance this might seem superfluous, as receipt of this signal has been mapped to signal_catcher
, it allows duplicate SIGUSR1 signals to be pending rather than discarded. Then, in an endless loop, the program pauses when the sigsuspend
statement is reached, waiting for the receipt of the SIGUSR1 signal. Once the SIGUSR1 signal is received (caught), the signal-catching function is executed. While in the signal-catching function, all signals that can be blocked are held. A set of messages indicating the beginning and end of an important section of code are displayed. When the signal-catching routine is exited, any blocked signals are released. In summary, the program defers the execution of an interrupt-protected section of code until it receives a SIGUSR1 signal. A run of the program produces the output shown in Figure 4.12.
Example 4.12. Output of Program 4.8.
linux$ p4.8 & Waiting for SIGUSR1 signal [1] 6277 linux$ kill -USR1 %1 Beginning important stuff linux$ kill -INT %1 linux$ jobs [1] + Running p4.8 linux$ Ending important stuff [1] Interrupt p4.8
The process was first sent a SIGUSR1 signal that caused it to begin the program function signal_catcher
. While it was in the signal_catcher
function, an interrupt signal was sent to the process. This signal did not cause the process to immediately terminate, as the process had indicated that all signals were to be blocked (held). The jobs
command confirms that the process is still active after the interrupt command was sent. However, once the blocked signals are released (when the signal-catching routine is exited), the pending SIGINT signal is acted upon and the process terminates.
As we have seen, lock files, the locking of files and signals, can be used as a basic means of communication between processes. Lock files require the participating processes to agree upon file names and locations. The creation of a lock file carries with it a certain amount of system overhead characteristic of all file manipulations. In addition, the problems associated with the removal of “leftover” invalid lock files and the implementation of nonsystem-intensive polling techniques must be addressed. On the positive side, lock file techniques can be used in any UNIX environment that supports the creat
system call, and cooperating processes do not need to be related.
UNIX has predefined routines that can be used to lock a file. We can use the presence of a lock on a file to indicate that a resource is unavailable. Advisory locking is less system-intensive than mandatory locking and is thus more common. As with lock files, the participating processes using advisory locking must cooperate to effectively communicate.
Signals provide us with another basic communication technique. While signals do not carry any information content, they can be, as we have seen, used to communicate from one process to another. From a system implementation standpoint, signals are more efficient than using lock files. However, participating processes must have access to each other's PIDs (in most cases the processes will be parent/child pairs). In most environments, the number of user-designated signals is limited. Cooperating processes must agree upon the “meaning” of each signal. When a signal is sent from one process to another, unless the receiving process acknowledges the receipt of the signal, there is no way for the sending process to know if its initial signal was received. Signal manipulation can be tricky, and its implementation from one version of UNIX to another may vary (this is one of the last areas of UNIX to be standardized). All of these techniques are easy to understand and to implement but are often difficult to implement well. However, all approaches have a number of limitations that remove them from serious consideration when reliable communication between processes is needed.
aborting a process
advisory locking
alarm
system call
asynchronous
atomic
consumer process
core image
creat
system call
fcntl
system call
file locking
flock
structure
ignoring a signal
interrupt
isatty
library function
kill
command
kill
system call
link
system call
lock file
lockf
library call
mandatory locking
nohup
command
pause
library function
polling
producer process
race condition
raise
library function
real-time signals
shlock
command
sigaction
structure
sigaction
system call
signal blocking
signal catcher
signal delivery
signal generation
signal
system call
signals
sigpending
system call
sigprocmask
system call
sigsuspend
system call
sleep
library function
stopping a process
unlink
system call
[1] As the superuser has special privileges, the lock file implementation shown here would not work for the superuser.
[2] At one time the open
system call did not support the O_CREAT
(create) option.
[3] EACCES is a defined constant found in the <sys/errno.h>
header file.
[4] If smaller intervals are needed, there is a usleep
(unsigned sleep) library function that suspends execution of the calling process for a specified number of microseconds.
[5] ANSI C also defines a raise
library function that can be used by a process to send itself a signal.
18.118.163.207