CHAPTER 6
Invocation and Execution

This chapter discusses the runtime of the shell. The shell must be started in some way; this is discussed in the next section, "Shell Invocation." Once the shell is running, it is important to understand what the shell actually does, the order in which substitutions occur, and the way the shell interacts with other processes. It may even be necessary on rare occasions to use additional tools to debug a script that behaves unexpectedly.

Shell Invocation

This section discusses the process of starting the shell (or a shell script) and telling it what to do; this is called invocation. The shell itself is a command, like any other, which takes command-line options and arguments; similarly, each shell script becomes a command that can be invoked, and most take options, arguments, or both.

The words following a command's name on the command line are generally called parameters or arguments. I use the term arguments to avoid confusion with shell parameters (which are sometimes called variables to avoid the same confusion). Many commands take special arguments, called options, which change the way the program behaves. The UNIX convention is that options are generally introduced with a hyphen and typically have single-letter names. Multiple options can be combined; foo -ab is usually the same as foo -a -b. Some options may take an additional word as an argument, such as the -e option to grep, which takes a regular expression as an argument.

The meaning of arguments that are not options may vary. Many UNIX utilities treat all of their arguments the same, typically as a list of file names. However, there are many exceptions; it is also common for the first nonoption argument to a command to be special. In grep, for instance, if no -e option is provided, the first argument is a regular expression, and following arguments are file names.

How UNIX Runs Scripts

When a UNIX-like system tries to execute a file, the kernel checks to see what kind of file it is. If it is a regular file with execute permission, the kernel tries to execute it. The kernel starts by examining the file to see if it is a known type of executable by looking for a distinctive header; for instance, on many modern systems, the kernel checks for an Executable and Linking Format (ELF) header denoting an executable in the ELF format.

On essentially every "real" UNIX-like system (all UNIX systems and all UNIX clones), there is a common standard executable script format—a file starting with the characters #! (called a shebang, short for sharp-bang). Strictly speaking, this behavior is not mandated by POSIX, and there are subtle variances between systems; in practice, it is universal as long as you are reasonably cautious. Such a file is taken to be a script file to be run by an interpreter. The rest of the first line of the file indicates what program the script is to be used with. For instance, the common shell header #!/bin/sh indicates that the script is to be used with /bin/sh. To execute the file, the kernel executes the command /bin/sh with the name of the script as its first argument. Spaces after the ! are permitted but ignored; you can ignore the occasional rumors that it is nonportable to omit the space.

The command name in a shebang line is nearly always an absolute path. The kernel does not search $PATH for a binary; it just tries to find a file of the given name. So a script starting out #!sh is treated as a script for the sh program in the current directory. The command cannot itself be another #! script.

Traditional shells treat a file marked as executable, but lacking a header, as a shell script. This behavior is required by POSIX, but you should never rely on it. In particular, it is harder to tell which shell will be used to run a script invoked in this way, especially if it is being invoked by a shell other than the standard shell. Some shell documentation describes a script run this way as a "subshell," but the shell context (functions, aliases, shell variables, and so on) is cleared out as though it were a new shell.

Interestingly, the POSIX definition of the shell explicitly does not specify what happens if a file starts with #!; this is because a hypothetical non-UNIX system could comply with POSIX but treat all scripts as shell scripts. In fact, many UNIX systems simply treat all executable files (which are not recognized by the kernel) as shell scripts, even if they are not text files! A typical result from trying to execute an executable from another machine is a cryptic error message:

$ ./somega
./somega: 1: Syntax error: "(" unexpected

This file was an old executable compiled on an older machine and copied around with the rest of my files. Because the executable was not compatible with the hardware I tried it on, the kernel failed to execute it. Since it was marked executable, the shell tried to execute it anyway. POSIX allows the shell to print a warning and fail to execute non-text files, but many shells don't bother.


Warning The shell does not necessarily check whether an executable file is actually a shell script!


There are two places where other arguments may be passed to the new script. If there is a space after the command's name, anything else on the line is also passed to the command, before the script's name; for example, #!/bin/sh -x runs a script in trace mode. The remainder of the line may or may not be split into multiple arguments; most systems pass it as a single argument, even if it contains spaces—but do not rely on this. If a script's header is #!/bin/sh -x -y, the options are usually passed as a single argument containing a space, not two separate arguments.

Secondly, if the script command originally had arguments, those arguments are passed after the script file's name. If a script starting with #!/bin/sh -x is invoked as ./script hello, the original arguments passed to the shell are /bin/sh, -x, ./script, and hello. The shell interprets -x as an option. It then uses ./script as a script file (setting $0 to ./script, and reading commands from that file instead of standard input) and passes hello as $1. This behavior is precisely the same as you would get by explicitly invoking the shell on the file.


Shell Options

Options passed to the shell control various implementation choices or settings, some of which are visible within a script as flags. Some command-line options set flags that can be changed later using the set command. You can see the current status of shell flags in the special shell parameter $-, which represents them as a string:

$ echo $-
ilms

This means that the shell has the -i, -l, -m, and -s flags set. These options may not apply to all shells, and not all shell options are portable. If you want to check for a given option, check to see whether its letter is present. For instance, a script can determine whether or not it is in trace mode:

case $-in
*x*) ;;
*)   echo "+ $cmd" >&2;;
esac

This rather quirky bit of code displays $cmd on standard error if the trace flag is not set. The trace flag displays simple commands before executing them, but it does not display shell control constructs, such as case statements; if it is set, no simple commands are executed, so none of this code is displayed.

The most common flag to check for is the -i flag, which is set in an interactive shell session (discussed in more detail in the section "Shell Startup and Interactive Sessions").

Additional settings may be available using the special -o option; for instance, in ksh or bash, set -o vi enables vi-style command-line editing. These settings are generally not portable between shells. Furthermore, some shells may abort if asked to set an unknown option. Be aware of this, but avoid it in scripts.

Using Positional Parameters

Any additional words after the last shell option are arguments to the shell. If no commands are provided using the -c option, the shell treats its first argument as the name of a script to run, and following arguments as arguments to that script. Otherwise, all arguments are passed on to the script.

The arguments passed to the shell are stored in special shell parameters named $1, $2, and so on. These are called the positional parameters. The name of the shell itself is stored in $0 for an interactive session, but when the shell is running a script, $0 holds the name of the script. Although the shell in the previous example actually received four arguments (the first being the path of the shell executable), it sets $0 to ./script and $1 to hello. The name of the shell, and the command-line options to the shell, are consumed by the shell and not exposed to the script program. The number of positional parameters is stored in the special parameter $#. For historical reasons, the shell's parser treats $10 as the value of $1 with the string 0 appended to it. To use parameters past $9, use ${N} in a modern shell. Older shells, including the SVR4 shell, will not accept larger values under any circumstances; in these, you must extract earlier values and use shift to move other parameters into the first nine slots.

Although some shells offer extensions providing for array variables, the positional parameters are the only array conveniently available to a portable shell script. Because of this, they are used for much more than just argument processing. One common idiom is to extract all options and arguments from the positional parameters at startup to free them up for later use in argument parsing. (Trickery such as using many similarly named variables to substitute for arrays, while portable, is awkward and not always efficient.)

The set Command

Unlike variables, the positional parameters cannot be directly set using variable assignment; 1=2 is just an unknown command to the shell, not an assignment into $1. The set command can be used to set the positional parameters.

The set command takes a special option (--) to indicate that you are setting something other than shell options; any following arguments are assigned to the positional parameters, with the first argument going into $1. The general syntax for this usage is set -- values. Although set is a special shell builtin, the arguments are processed normally; parameter and command substitution, globbing, and field splitting all apply.


In some scripts, this is used as a simple way to get access to the results of variable expansion and word splitting applied to one or more variables, or to add values to the positional parameters before executing something. For instance, if you want to insert a value in front of the existing arguments, you can use $@ and the set command:

set -- new "$@"

Another common idiom is to use $IFS and the set command to split a value around something other than whitespace. For instance, a classic UNIX password file entry uses colons as separators. You can read it in the shell using the following idiom:

save_ifs=$IFS
IFS=:
set -- $passwd
IFS=$save_ifs

The set command is not particularly complicated in and of itself, but using it effectively can be complicated. Setting all of the arguments at once can be awkward when you want to build or modify argument lists. You can also append additional arguments:

set -- "$@" "$new"

This appends $new to the argument list at the end.

Removing Positional Parameters

It is sometimes desirable to remove parameters from the shell's parameter list. This is done using the shift command, which removes positional parameters. You can use shift with or without an argument. With an argument (shift N), it removes the first N positional parameters, renumbering the later parameters to the front of the list. Without an argument, it removes the first parameter. The standard for loop that iterates through the positional parameters is nearly equivalent to the following while loop:

while test $# -gt 0; do
  echo "$1"
  shift
done

The equivalent for loop is as follows:

for i
do
  echo $i
done

In fact, there is a significant difference between these loops. After the for loop completes, the positional parameters are unchanged, but after the while loop completes, there are no positional parameters remaining. This can be useful. A common idiom for parsing command-line options is to consume options, leaving arguments for further processing:

opt_a=false
opt_b=false
opt_c=""
while test $# -gt 0; do
  case $1 in
  -a)  opt_a=true ;;
  -b)  opt_b=true ;;
  -c)  opt_c="$2"; shift ;;
  --)  break ;;
  esac
  shift
done
for arg
do
  # process non-option argument $arg
done

The first loop consumes any arguments that look like known options. The special option -- indicates the end of options, allowing the user to specify an argument that happens to start with a hyphen. This provides robustness in the face of programs whose arguments might otherwise look like arguments. This is one of the ways to deal with problems, such as needing to remove a file named -rf.

Manipulating Parameters for Fun and Profit

Individually, the tools the shell provides for argument manipulation may seem a little weak. There is no way to assign a single parameter or to insert a parameter later in the list. There are a number of shell idioms for argument list manipulation, but many of them are unreliable when confronted with arguments containing spaces. Consider the following simple loop, intended to extract options and separate them out from file arguments:

files=""
opts=""
for arg
do
  case $arg in
  -*) opts="$opts $arg";;
  *) files="$files $arg";;
  esac
done
set -- $opts -- $files

This works pretty well, as long as none of the files, or options, contain spaces. (If you want this functionality, without those bugs, you should probably use getopt or getopts, discussed in the section "Handling Options and Arguments"; I picked the example because it is tricky to get it right and interesting to think about.) There are several ways to attempt to resolve this difficulty.

If you can think of a character that you are confident cannot occur in any of your options, this is actually easy to do. Unfortunately, techniques like this are pretty limited; they rely on coincidence in many cases. For instance, very few file names contain colons; so you might use colons to separate a list of files, but then a file with a colon in its name can wreck your whole day. Here is an example of how you could use a colon to separate words:

files=""
opts=""
for arg
do
  case $arg in
    -*) opts=${opts+$opts:}$arg ;;
    *) files=${files+$files:}$arg ;;
  esac
done
save_IFS=$IFS
IFS=:
set -- $opts -- $files
IFS=$save_IFS

There are three major changes here. The first is the use of a different character (in this case, a colon) to separate words within the $opts and $files variables. The second is the use of a corresponding value of $IFS to split the variables again. The third, closely related to the second, is a more complicated inner assignment. Without this, the shell generates a spurious empty argument at the beginning of each list. For example, if the arguments were foo bar, $files would end up set to :foo:bar. Note the subtle difference between this behavior and what happens when $IFS is unset (or has its default value); normally, a variable with a leading space does not expand into an extra field.

You can use other values for $IFS. Some scripts use control characters for this, precisely because they are very unusual in file names. However, there may be quirks; for instance, at least one version of bash can't handle $IFS being set to control-A.

You can also use simulated arrays using eval (as explained in Chapter 5) to store arguments without worrying about separators:

filec=0
optc=0
for arg
do
  case $arg in
    -*) eval opt_$optc=$arg
        optc=`expr $optc + 1`
        ;;
    *) eval file_$filec=$arg
        filec=`expr $filec + 1`
        ;;
  esac
done
shift $#
while test $filec -gt 0; do
  filec=`expr $filec - 1`
  eval 'set -- "$file_'$filec'" "$@"'
done
set -- "--" "$@"
while test $optc -gt 0; do
  optc=`expr $optc - 1`
  eval 'set -- "$opt_'$optc'" "$@"'
done

The array code here is similar to what was done in Chapter 5. The script extracts the arguments, then clears the argument list and repopulates it using while loops.

Each while loop goes through pushing arguments to the front of the list. Single quotes are used to reduce escape characters. For the first file argument, the eval command string ends up as follows:

set -- "$file_0" "$@"

No matter what values the variables contain, this works—they are substituted in as plain words, not keywords or shell syntax. The "$@" expansion preserves the existing arguments as separate arguments, regardless of their contents. In fact, the same basic techniques allow you to do arbitrarily complicated things, such as replacing a specific parameter while leaving the rest alone.

The most obvious limitation is that it does not work if you try to bundle it into a shell function. As shell functions have their own local set of positional parameters, modifications to the positional parameters within a function have no effect on the calling script.

Handling Options And Arguments

Although it is certainly possible to manually process arguments, as in the previous example, the task is common enough to have been solved repeatedly. Unfortunately, the solutions are not entirely portable. The first is the getopt command, which parses a command line and produces a new command line conveniently ordered. The syntax is getopt string parameters, and the output of the command is the parameters reordered, with options separated out and identified, according to the list of options in string. (In fact, the previous loop does most of the work of implementing getopt.) The options string lists the letters of accepted options; options that take an argument are followed by a colon.

Because the getopt command is not a shell builtin, and does everything by producing output, you can experiment with it at the command line to see how it works:

$ getopt a hello, world
 -- hello, world
$ getopt a -a hello, world
 -a -- hello, world
$ getopt a -b hello, world
getopt: illegal option -- b
 -- hello, world
$ getopt ab -ab hello, world
 -a -b -- hello, world
$ getopt ab: -ab hello, world
 -a -b hello, -- world
$ getopt ab: -ba hello, world
 -b a -- hello, world

The output of the getopt utility is options -- non-options. As each parameter beginning with a hyphen is evaluated, it is converted into a series of options. If an option that takes an argument is encountered, its argument is either the rest of the word (if there is any left) or the next word, whatever that may be. Options in clusters are separated out; -ab becomes -a -b. As with many utilities, getopt treats -- as ending options and beginning the nonoption parameters. The output of the getopt utility is intended to be used to replace the positional parameters; the canonical usage is combined with the set command:

set -- `getopt options "$@"`

This usage is portable on recent systems. You can then iterate over the positional parameters, extracting options, without having to worry about exactly what characters are part of which options. Doing this by hand is exceedingly difficult in shell and not really worth the trouble. However, the getopt utility does have one crucial limitation—it cannot gracefully handle parameters containing whitespace.

Modern shells generally provide a getopts built-in command, which is able to set shell variables, and thus provide more reliable handling of parameters. As the phrase "modern shells" suggests, this is not completely portable yet. Surprisingly, the shell in older versions of Cygwin was compiled so that it included the code for getopts, but it did not actually recognize the command. This has been fixed in modern releases.

The getopts command is used more like the read command, returning true or false depending on whether or not there is a next option, and returning one option at a time. The syntax of the command is getopts string variable parameters; if parameters are omitted, getopts uses the positional parameters. Each time getopts is invoked, it looks for another option and stores the option character in $variable. If there are no more options, getopts returns false. If there is an error, getopts returns true and sets $variable to ?. A typical usage of getopts looks like this:

while getopts ab: o; do
  case $o in
  a)  echo "received flag a";;
  b)  echo "received option b: $OPTARG";;
  esac
done
shift `expr $OPTIND - 1`

The special shell variable $OPTARG holds the argument provided for an option that requires an argument. The special shell variable $OPTIND holds the number of the first nonoption positional parameter. For example, if there are no options, $OPTIND has the value 1 after getopts has run (and returned false). Because the positional parameters number from one, executing shift $OPTIND would remove the first nonoption parameter from the list. Like getopt, getopts recognizes -- as the end of options and uses the remainder of a word as an argument if an option expects an argument.

Because getopts can handle arbitrary arguments reliably, I prefer it. While traditional shells did not provide the getopts builtin, modern shells, including the SVR4 shell, do.

Older Shells: Now What?

While nearly all modern shells support getopts (and you could write it as a function fairly portably), it may occasionally become necessary to work with a very old shell that lacks this feature. The following boilerplate code handles a broad variety of arguments fairly well. (Many of the names are placeholders used to illustrate how to handle common tasks in shell code.)

opt_boolean=false
opt_accumulator=0
opt_argument=''
# opt_list='' this is unset so that ${opt_list+item} will work

# sed scripts:
my_sed_single_opt='1s/^(..).*$/1/;q'
my_sed_single_rest='1s/^..(.*)$/1/;q'
my_sed_long_opt='1s/^(--[^=]*)=.*/1/;q'
my_sed_long_arg='1s/^--[^=]*=//'

while test $# -gt 0; do
  opt=$1
  shift
  case $opt in
    # standard usage patterns:
    -a|--accumulator)   opt_accumulator=`expr 1 + $opt_accumulator` ;;
    -A|--argument)      opt_argument=$1
                        shift
                        ;;
    -b|--boolean)       opt_boolean=:
                        ;;
    --composite)        set dummy --boolean --list element ${1+"$@"}
                        shift
                        ;;
    --list)             opt_list=${opt_list+$opt_list:}$1
                        shift
                        ;;

    # Add your own long and short option branches here, and then
    # change the branch match expressions below to match the
    # appropriate options for splitting and reparsing...

    # Separate optargs to long options:
    --argument=*|--list=*)
                      arg=`echo "$opt" | $SED "$my_sed_long_arg"`
                      opt=`echo "$opt" | $SED "$my_sed_long_opt"`
                      set dummy "$opt" "$arg" ${1+"$@"}
                      shift
                      ;;

    # Separate optargs to short options:
    -a*|-p*|-q*|-r*)
                      arg=`echo "$opt" |$SED "$my_sed_single_rest"`
                      opt=`echo "$opt" |$SED "$my_sed_single_opt"`
                      set dummy "$opt" "$arg" ${1+"$@"}
                      shift
                      ;;

    # Separate non-argument short options:
    -b*|-x*|-y*|-z*)
                      rest=`echo "$opt" |$SED "$my_sed_single_rest"`
                      opt=`echo "$opt" |$SED "$my_sed_single_opt"`
                      set dummy "$opt" "-$rest" ${1+"$@"}
                      shift
                      ;;

    -?|-h)           func_usage                                     ;;
    --help)           func_help                                      ;;
    --version)        func_version                                   ;;
    --)               break                                          ;;
    -*)               func_fatal_help "unrecognized option `$opt'"  ;;
    *)                set dummy "$opt" ${1+"$@"}; shift; break       ;;
  esac
done

While this may seem like a lot of work to avoid getopts, it is worth noting that this supports a number of helpful idioms, such as long argument names. The functions used for the last few options are left as an exercise for the reader; their behavior should be obvious from the context. Of particular interest is the code used to separate out multiple options given as a single argument. If you call this code with -bx as an option, the first pass through the loop replaces this with -b -x. You would have to define the -b) case for this to be processed correctly, though. As long as the -b case occurs before the -b* case, the first one matches and the shell processes the argument appropriately.

For extra credit, modify the preceding example to detect and warn the user if no argument is provided for an option requiring one.

Shell Startup and Interactive Sessions

There are several different kinds of shell sessions. If the shell is expecting to read commands and respond with prompts, that is called an interactive session. When the shell reads commands from a file, it generally is not an interactive session. A shell taking input from a pipe is also not an interactive session; the distinction is whether the input device is considered to be a tty (a terminal device; the name is short for "teletype"). Some shell sessions are further considered to be login sessions; a login session is normally interactive.

During startup, the shell may read (and execute) one or more startup scripts. The exact rules for this are, sadly, nonportable between shells. If your home directory contains a file named .profile, an interactive login shell will probably execute it during startup. Unfortunately, this is merely probable, not certain; as an example, bash looks for files named .bash_profile or .bash_login first, and it does not execute .profile if it finds one of the others. The intended benefit, of course, is that you can have a startup specific to bash that need not be portable to other shell variants. However, if you have a standard .profile you bring from one machine to another, it can be surprising trying to debug why it isn't being used.

Shells other than login shells may also run startup scripts. This is even less predictable and may be subject to strange rules. For example, many POSIX shells will execute the file named by the environment variable $ENV at startup. Pre-POSIX shells do not, and bash executes $ENV only if it is being run in its POSIX mode or was invoked under the name sh; otherwise, it uses $BASH_ENV instead. Contrary to its behavior with .profile, bash does not execute $ENV just because $BASH_ENV is not set. In short, you can not rely on startup behavior in a portable script. What's worse is that you cannot rely on such files being run at startup; but also you cannot rely on them not being run at startup.

This brings us to one of the few genuinely intractable problems of portable shell scripting: A hostile user can misconfigure the shell so that it will not work by creating a startup file which prevents successful execution of your script, most commonly by creating aliases for common commands (the alias command is described in Chapter 7). You can override this somewhat by specifying full paths or quoted names for most commands, but it is very difficult to get right.

There is not very much you can do about the possibility that someone, somewhere, will end up feeding your script to a shell that is configured to alias various common commands on startup. However, you can avoid doing this to your own scripts. In any file that affects shell startup, be sure to execute aliases and similar code only when you are not in an interactive shell. The safest idiom to use for this is as follows:

case $-in
*i*) alias yes=no
     echo "Do you want me to hit you?"
     ;;
*)   ;;
esac

This causes the shell to execute its initialization commands only when the shell is not interactive. I have seen a different idiom for this:

case $-in
*i*) ;;
*)   return 0;;
esac

This is not safe. While there are shells in which the return command (used, in some shells, to exit from a function) can also end the execution of a file being executed by the shell using ., there are shells in which a return command outside of a shell function exits the entire shell. As it is not unheard of for a startup script to end up getting picked up by a different shell, this can cause a perfectly ordinary shell script to unexpectedly terminate without any diagnosis of errors.

When looking at startup scripts, there are three common cases. A login shell typically needs to perform additional setup to populate the environment; on many systems, this would also be the place to configure things like terminal types or start an ssh-agent process. After this has been done, other shells can simply inherit this environment. Among non-login shells, there is still a noticeable difference between interactive and noninteractive sessions. If you are working with a shell that can execute a startup script in a noninteractive session, be sure your startup scripts don't do anything time-consuming or interactive in a noninteractive session.

Execution

It is possible to program fairly effectively in shell without needing to know the exact details of how certain things are done. The shell reads and executes code. However, there is some possibility for confusion. When does the shell parse? What order do various substitutions occur in? Where is this error message coming from?

This section gives a more detailed view of the runtime behavior of the shell and introduces some of the debugging tools that may come up when the shell behaves unexpectedly.

More on Jobs and Tasks

Job control features, allowing a shell to control or manipulate multiple tasks, are mostly used on the command line, but there are cases in which you can take advantage of the shell's ability to manipulate multiple tasks to simplify some shell script design tasks. Some shells offer extensions (such as ksh's co-process feature) that make additional use of background tasks. For portable scripting, the primary thing you can do with background tasks is continue doing some other work while a long task processes. For instance, you could have a script that plays a game with the user while waiting for an archive to unpack—although most users would probably rather you didn't.

Signals and Interprocess Communication

It is often necessary to communicate between processes. UNIX provides several mechanisms for interprocess communication (IPC), of which three are available to the shell. Two of them have already been introduced: exit status and pipes. The exit status of a process is only sort of an IPC mechanism, but it allows for a child process to communicate to its parent whether or not it has succeeded. Pipes are an exceedingly flexible IPC mechanism, but the shell pipe syntax only allows one-way communication between a pair of programs.

The other IPC mechanism available to the shell is signaling. Signals are unusual in that the recipient of a signal may not have any opportunity to interact with it. Signals can simply terminate the receiving process. However, most signals may be intercepted by a program, which can define a piece of code to execute when it receives the signal. This piece of code is called a signal handler. The shell allows the user to define handlers for several of the common signals.

Signals are referred to by their names or by their numbers; there is a consistent mapping of names to numbers for the most common signals. The signals most likely to be used in shell programming are outlined in Table 6-1.

Table 6-1. Signals by Number

Number Name Trap Description Default Behavior
0 EXIT Yes Shell is exiting.
1 HUP Yes Session ended.
2 INT Yes Interrupt.
9 KILL No Kill.
13 PIPE Yes I/O error on pipe.
14 ALRM No Timer expired.
15 TERM Yes Default termination signal.
17 STOP No Process stopped.
18 TSTP No Process stop request from terminal.
19 CONT No Continue stopped process.
21 TTIN No Stopped waiting for input.
22 TTOU No Stopped waiting for output.
30 USR1 No User-defined signal #1
31 USR2 No User-defined signal #2

The default effect of a signal varies. For HUP, INT, TERM, ALRM, and KILL, the default behavior is for the process to terminate. If a process is killed by a signal, its exit status is generally reported as 128 plus the signal number. For instance, a program interrupted by an INT signal has an exit status of 130. The USR1 and USR2 signals are usually ignored. They exist to allow programs to define specific behaviors in response to those signals without changing handling of any of the standard signals that normally have an effect.

The STOP and TSTP signals, as well as TTIN and TTOU, cause a process to cease execution but not to exit; execution resumes on a CONT signal.

Some signals are generated automatically by the UNIX kernel. Any signal can also be generated artificially. You can send any signal to any program (running with the same user ID) using the kill command. The default signal (sent if no signal is specified) is TERM. Other signals can be specified using their name or number with a leading hyphen. For example, kill −9 pid sends a KILL signal to the process with process ID pid, as does kill -KILL pid. Numbers are more portable.

Signals can be caught by a shell program using the trap built-in command, although only some signals may be trapped portably. This command specifies an action to be taken in response to a signal. The syntax for the command is trap action signals. If action is omitted or an empty string, the shell ignores the given signal or signals. If action is a hyphen (-), the shell resets the signal to its default behavior. Otherwise, action is executed as though passed as an argument to eval when the signal is received; this replaces the usual behavior for the signal. Multiple signals may be specified in a single trap command, and signals may be specified by number (portably) or name (on modern systems). However, only one action may be specified; if you want to run multiple commands, you must quote them (and separate them with semicolons or new lines) or use a shell function.

Do not assume that $? is passed into a trap handler correctly; some shells do not do this. In general, avoid starting a trap handler with a shell function call.

When a signal is generated by the kernel, it may be sent to the shell and its child processes rather than only to the shell. For instance, if you hit Ctrl-C while running a script, the shell process and its associated children all receive the INT signal. The trap command only affects the signal received by the shell itself; child processes can still receive, and be affected by, signals.

The shell defines a special signal, signal number 0 (named EXIT), that is handled when the shell exits. For instance, the following shell script greets the user:

NAME=world
trap "echo Hello, $NAME!" 0



Hello, world!

The action specified in the trap command executes automatically at the end of the script. The handler for signal 0 is frequently used for cleanup of temporary files created during the execution of a script. Note, though, that the exit handler is not invoked if the shell is terminated by another signal. The special value 0 (but not the symbolic name EXIT) may be used as a signal for the kill command, too. In this case, kill sends no signal but yields a return code indicating whether or not the process exists. A successful return indicates that the process exists, and a failed return indicates that it does not. No signal is delivered by kill −0, so a handler for signal 0 does not execute except when the script exits.

Run with no arguments, trap prints a list of the current signal handlers, quoted such that evaluating this output restores the signal handlers:

$ trap 'echo "you cannot defeat me so easily!"' TERM
$ trap
trap -- 'echo "you cannot defeat me so easily!"' TERM

It is not portable to attempt to save only a single signal's output from this list by scanning the list for a particular value, as the existing handler might be more than one line of code. In this case, the shell command to recreate it would also be more than one line of code, and a simple check of matching lines would fail. However, if you have full control over a script, you can resolve this by ensuring that all signal handlers are a single line of code, allowing you to save individual values. The obvious solution is to pipe the output of trap into a while loop; this does not work because signal handlers are reset to their defaults within a subshell. To store trap values, store the output in a file, then read the file:

trap 'echo "you cannot defeat me so easily!"' TERM
trap 'echo "whoops, driving under a bridge."' HUP
trap > /tmp/trap.$$
while read sig
do
  set -- $sig
  eval "signum=${$#}"
  eval "sig_$signum=$sig"
done </tmp/trap.$$
rm -f /tmp/trap.$$
set | grep ^sig_



sig_HUP='trap -- '''echo "whoops, driving under a bridge."''' HUP'
sig_TERM='trap -- '''echo "you cannot defeat me so easily!"''' TERM'

The output of this script may vary between shells. In bash, the signals are spelled out as SIGHUP and SIGTERM, while ksh93 uses an extension to simplify the quoting of the strings. This means you cannot reliably expect one shell to correctly read or execute the output of a trap command run in another shell. However, all the shells are internally consistent; the output of the trap command in a given shell can be evaluated by that shell. Once you have saved the current signals, you can modify them or restore them individually. After running the preceding script, you could temporarily remove the HUP handler, then restore it:

trap - HUP
echo "Doing something long and boring. Will accept SIGHUP."
sleep 5
eval $sig_HUP

There are a few conventions about the use of signals. Interactive utilities generally abort upon receiving a HUP signal. Long-running daemons, though, often use the HUP signal as a cue to refresh their configuration, possibly rereading configuration files. Some use USR1 or USR2 for related tasks, such as refreshing or reopening log files.

Understanding Background tasks

Background tasks and subshells have unique pids. When a task is launched in the background, the parent shell gets the child's pid in the special shell parameter $!. However, if the job is running in a subshell, it does not know its own pid; it gets the parent's pid in the $$ parameter. By contrast, a job run with sh -c gets its own pid in the $$ parameter.

Shell background tasks may be distinguished by their pids. Background tasks (along with interactive control of multiple jobs, called job control) are primarily used interactively. However, it is possible to make some use of background tasks in shells.

Background jobs are always run in subshells, so they do not affect the parent shell's context. A background job cannot change the calling shell's directory, set variables, or otherwise modify the caller except by sending signals. If you wrote a loop to read values from a file and ran it in the background, it would not set variables in the calling shell. Similarly, you cannot change a directory in the background:

cd /tmp &

This creates a subshell that changes its working directory to /tmp, then exits. The parent shell is unaffected.

So what do background jobs do? Background jobs are often used when you want to run a longer command while you continue working; for instance, at the command line, it is quite common to run a long compile process or file operation in the background. In a script, you might still want to run a long task in the background. To do so, you need to be able to determine whether the task is still running, wait for it to complete, or even abort it if you change your mind. All of this can be done.

Shell scripts that wish to use background tasks can keep track of them using their pids. Immediately after launching a background task, you can obtain its pid from the $! shell parameter. This can be used to send signals to the background task (using the kill command) or to wait for it later. If you have a large file-manipulation task to run, which may take several minutes and requires no user interaction, it might make sense to start it in the background, perform other tasks, then wait for it after those tasks are finished.

The wait command waits for background tasks to complete. Without arguments, it waits for all background tasks to complete and returns a successful exit status. If you specify the pid of a specific background task, it waits for that task to complete and returns the return code of that task. If the task has already completed, or the pid in question is not the pid of a child process of this shell, the wait command returns immediately indicating failure. The following trivial script begins an operation, then waits for it to complete:

tar cf archive.tar files &
child=$!
echo "Waiting for archive..."
wait $child

While waiting for a child is easy, and killing it is also easy, it is a little harder to check whether it is still running. The command kill −0 pid might work; if it succeeds, you know that there is a process numbered pid and that you have permission to send signals to that process. However, you do not know for sure that it is the child process you started; that process could have ended, and the pid then recycled.

Making Effective Use of wait

The wait command exits immediately if you ask it to wait for a process that is not a child of the current shell. However, if the process is still a child, the wait command waits for it. There is no portable way to check reliably whether a given process is a child of the current shell. The wait command runs in the calling shell, so to interrupt it, you must send a signal to the parent shell. If the signal would normally interrupt the shell, the signal will terminate the shell unless the signal is trapped.

If you send a signal to the shell while it's waiting, and the signal is trapsped, the resulting behavior is unportable. Possible outcomes include the wait command aborting immediately or continuing until the child dies. Typically, the trap executes after the wait completes, but in zsh the trap executes immediately and the wait command continues anyway. This varies not only between shell families but between systems; the ash in use on NetBSD and FreeBSD systems differs from dash on Linux.

So, once the wait is started, you can't reliably interrupt it without killing your shell. You can't run wait on a background task in a subshell because the subshell is not the parent of the background task.

In practice, you can usually get away with checking the pid with kill −0 and expect that this will give you a good guess as to whether the child process is still running. This is not perfectly reliable, but is usually pretty good.

If you only need to monitor a single background task, you can solve the problem by having the background task notify the parent shell when it is done, rather than the other way around. To do this, you can have the child process send the parent shell a USR1 signal, which you have cleverly trapped. The following script prints "Nope, still waiting..." three times, but it could perform any activities you wanted while waiting; the point of the example is that you can tell when the subshell has exited:

done=false
trap 'done=true' USR1
(sleep 3; kill -USR1 $$) &
while if $done; then false; else true; fi; do
  echo "Nope, still waiting..."
  sleep 1
done

The subshell keeps the parent shell's pid as $$, so the kill command sends a USR1 signal to the parent shell after the previous command completes. It is a bit harder to use this with more than one background task; you cannot tell which process sent you a particular signal.

A similar technique can be used to once again invert the sense of the problem. Imagine that you have a task you wish to run, but you do not want to run it forever because it might hang. If it has not completed within a given amount of time, you want to kill it. The following rather ugly one-liner does fairly well at this:

sh -c 'sh -c "sleep '$delay'; kill $$" >/dev/null 2>&1 & exec sh -c "'"$*"'"'

This shell fragment runs the provided arguments ($*) in a child shell, but it terminates that shell after $delay seconds if the child shell has not already exited. The exit status is the exit status of the child shell, which reflects the abnormal exit if the kill command fires. This example shows off a variety of expansion rules, subshells, and quoting behaviors. The first thing to note is that, at the top level, this command invokes a shell (using sh -c) that actually executes a command in which some variables have been expanded. Assuming that $delay contains the number 5, and the positional parameters contain the string command, the child shell then executes this:

sh -c "sleep 5; kill $$" >/dev/null 2>&1 & exec sh -c "command"

The command line is assembled from a single-quoted string (up through sleep and the space after it), the expansion of $delay, another single-quoted string (up to the last sh -c and the following double quote), a double-quoted expansion of $*, and finally a single-quoted double quote. This brings us to the question of what this elaborate list actually does.

The child shell executes two commands. The first is another child shell, which I'll call the grandchild for clarity, running the command sleep 5; kill $$. Because $$ occurs in double quotes, it is expanded by the child shell, not by the grandchild shell; this matters because the grandchild shell is not a subshell and does not inherit the child shell's $$.

The grandchild shell's output and error streams are directed to /dev/null. So, after 5 seconds, the grandchild shell attempts to kill the child shell. Meanwhile, because the shell command that started the grandchild ends with the & separator, the child shell goes on to execute the next command in its list. This command is another shell, which runs the external command. The command is passed to a new shell to allow it to be parsed, to contain arbitrary keywords, and so on. However, to ensure that this process can be stopped, the script must know the process ID it will run under. Conveniently, the exec command runs the new command in place of the caller; thus the new shell is run using the same process ID—the one that was passed to the grandchild shell to be killed in $delay seconds.

This has a couple of weaknesses. The first is that, if the grandchild process (containing the command you are actually interested in) exits quickly, the kill command fires anyway. This could result in a new process getting sent the signal, if the pid is reused. This is uncommon, but not impossible. Also, it is often better to send more than one signal (first a polite reminder, then an actual KILL signal) so commands that need a second or so for shutdown can do it cleanly. This actually increases the window for possible problems, but it improves the reliability of execution in the common case where the child process has important cleanup work to do before exiting. The following code is based on an elegant solution suggested by Alan Barrett, used by his kind permission:

func_timeout() (
  timeout=$
  shift
  "$@" &
  childpid=$!
  (
    trap 'kill -TERM $sleeppid 2>/dev/null ; exit 0' TERM
    sleep "$timeout" &
    sleeppid=$!
    wait $sleeppid 2>/dev/null
    kill -TERM $childpid 2>/dev/null
    sleep 2
    kill -KILL $childpid 2>/dev/null
  ) &
  alarmpid=$!
  wait $childpid 2>/dev/null
  status=$?
  kill -TERM $alarmpid 2>/dev/null

  return $status
)

This is a rather elaborate shell function and deserves some careful explanation. The first four lines are straightforward:

  timeout=$1
  shift
  "$@" &
  childpid=$!

The first two lines extract the timeout value (passed as the first argument to the function) from the positional parameters of the function, then remove it from the positional parameters. The function then executes the remaining arguments as a command. Note that they are executed as a single command, with no shell syntax (such as semicolons); if you wanted to support additional shell syntax, you would have to pass them to a new shell, probably using sh -c. The shell then obtains the pid of the background task, storing it in the shell variable $childpid.

  (
    trap 'kill -TERM $sleeppid 2>/dev/null ; exit 0' TERM
    sleep "$timeout" &
    sleeppid=$!
    wait $sleeppid 2>/dev/null
    kill -TERM $childpid 2>/dev/null
    sleep 2
    kill -KILL $childpid 2>/dev/null
  ) &
  alarmpid=$!

This is where the magic happens. This runs a second background task in a subshell. The task starts by trapping the TERM signal. The handler kills $sleeppid, then exits. The handler is specified in single quotes, so $sleeppid isn't expanded yet, which is good, because it hasn't been set yet either. (If this subshell gets killed before it gets any farther, the handler executes the command kill -TERM, with no arguments; an error message is emitted to /dev/null and nothing happens.)

The subshell now launches a background sleep task, stores its pid in $sleeppid, and waits for the sleep to complete. If the sleep command completes normally, the subshell then tries to kill the original child, first with a TERM signal, then with a KILL signal. This whole subshell is run in the background, and its pid is stored in the variable $alarmpid.

  wait $childpid 2>/dev/null
  status=$?
  kill -TERM $alarmpid 2>/dev/null

  return $status

Now the parent shell waits for the child process. If the child process has not completed when the background subshell finishes sleeping, the background subshell kills it. Either way, when the child process terminates, the parent shell extracts its status, and then tries to kill the alarm process. There are two ways this can play out. The first is that the child process might not die from the TERM signal, in which case, the alarm process tries to kill it with a KILL signal and then exits. In this case, the parent shell's attempt to end the alarm process could theoretically hit another process, although the window is very narrow. The second (more likely) possibility is that the child process dies from the TERM signal, so the parent shell kills the alarm process, which then tries to kill its sleep process (which has just exited) and then exits. In any event, the function returns the status of the child process; if it was terminated by a signal, the status usually reflects this. (Some shells may strip the high bit, which indicates that a process was terminated by a signal.)

The variables set locally in the function, such as $childpid, do not show up in the calling shell because the whole function is run in a subshell. Of course, the nested subshells and background tasks impose a noticeable performance cost, especially on a Windows system, but on the other hand, this kind of code is likely only to be run with tasks that can run for some time. Even if spawning subshells takes a noticeable fraction of a second, a 10-or 20-second runtime will dwarf that cost completely.

Techniques like this can be very useful while trying to perform automated testing, but a caveat is in order: There is no safe estimate available for what $timeout should be. If you are using something like this to catch failures, be sure you have thought about the performance characteristics of the command you want to time out waiting for. For instance, retrieving a web page typically takes only a couple of seconds, so you might set a time limit of 10 seconds. However, if a DNS entry has gotten lost or misconfigured and a web server is trying to look up names, it is quite possible for a connection to a host to take over 30 seconds simply to start up. Aborting too early can give misleading results.

Understanding Runtime Behavior

Previous sections of this book have introduced a number of things the shell does to its input. Input is broken into tokens, parameters and commands are substituted, and globs are replaced. Nearly every time a shell script has really mystified me, it turned out that I had forgotten the order of operations or the special circumstances under which an operation did not occur. The first thing to know is the basic order of operations, as shown in Table 6-2.

Table 6-2. Shell Operations in Order

Order Operation Notes
1st Tokenizing Creates tokens. This is the only phase that can create keywords or special shell punctuation. Words are split on whitespace.
2nd Brace expansion Only in some shells; see Chapter 7.
3rd Tilde expansion Replaces tilde constructs with home directories. Not universal.
4th Substitution Variable and command substitution (also arithmetic substitution in some shells; see Chapter 7).
5th Field splitting Results of substitution split on $IFS.
6th Globbing Glob patterns expanded into file names, possibly producing multiple words.
7th Redirection Redirection operators processed, and removed from command line.
8th Execution Results executed.

Of course, nothing in shell is this simple. There are two contexts in which field splitting and globbing are not performed. These are the control expression of a case statement and the right-hand side of variable assignment. Quoting also changes many behaviors. In single quotes, no expansion, substituting, splitting, or globbing occurs. In double quotes, tilde expansion, field splitting, and globbing are suppressed; only substitution is performed.

In the case where the command executed is eval, the arguments are subject to all of these steps again and subject to the same rules (including quoting, if there are any quotes in the arguments to eval).

These steps are taken one command at a time. The shell does not parse a whole script before beginning execution; it parses individual lines. At the end of each line, if the shell needs more tokens to complete parsing a command structure or command, it reads another line. When the shell reaches the end of a line (or the end of the whole script file) and has one or more valid commands, it executes any valid commands it has found. The following script always executes the initial echo command, even though the line after it is a syntax error:

echo hello
case x do



hello
script: 2: Syntax error: expecting "in"

However, if the commands are joined by a semicolon, the shell tries to finish parsing the first line before running the command:

echo hello; case x do


script: 1: Syntax error: expecting "in"

Even if the command is long and complicated, such as a case statement containing nested if statements, the whole command must be parsed before anything is executed.

Behavior with subshells is more complicated. Some shells perform complete parsing (but no substitution) of code that will be executed in a subshell. Others may let the subshell do some of the parsing. Consider the following script fragment:

if false; then
  ( if then )
else
  echo hello
fi

Should this script work? We can tell by inspection that the subshell command (which is invalid) is never run. However, every shell I have tried rejects it for a syntax error. A more subtle variant may escape detection:

if false; then
  ( if then fi )
else
  echo hello
fi

This version passes muster with ash and zsh, but it is rejected by ksh93, pdksh, and bash. Replacing the subshell with command substitution makes it easier to get shells to accept such code, but even then ash rejects it if the fi is omitted.

In practice, the best strategy is the simplest—ensure that code passed to subshells is syntactically valid.

Command Substitution, Subshells, and Parameter Substitution

When commands are executed in subshells, they are not subject to any kind of expansion, substitution, field splitting, or globbing in the parent shell. This is true whether you are dealing with an explicit subshell or the implicit subshell used by command substitution.

This behavior is closely tied to the fact that nothing can ever expand to a keyword. The parent shell can always determine which tokens belong in a command to be passed to a subshell without performing any kind of substitution; it simply passes those tokens to the sub-shell, which performs any needed substitutions.

This is generally true even for implicit subshells used in a pipeline, although it is not true of zsh in some cases:

true | true ${foo=bar} | true

In zsh, if $foo was initially unset, it is set to bar. In other shells, it remains unset.

The previous example may seem a bit contrived. There are very few reasonable cases in which it matters at all whether it is the parent shell or a subshell performing substitutions; outside of the = form of variable assignment and special variables like $BASH_SUBSHELL, it simply never matters. However, understanding it can make it easier to see how the shell works.

Quoted and Unquoted Strings

It is easy to understand the behavior of both quoted and unquoted strings when each token is one or the other. The shell's behavior when quoted and unquoted strings are not separated by space is a bit more intricate, but you have to use it sometimes; very few interesting scripts can be written without combining quoted and unquoted text.

For the most part, quoting is predictable. Each quoted block is interpreted according to its own quoting rules, and the results are concatenated into a single string. Substitution occurs only within unquoted or double-quoted text, and field splitting occurs only outside of quotes.

The interaction of globbing and quoting, however, can be confusing. If you have quoted and unquoted glob characters in a single string, the quoted ones remain literal and the unquoted ones are available for pattern matching. Thus the pattern '*'* matches file names starting with an asterisk.

The interaction of tilde expansion with quoting is not portable; some shells will expand ˜'user' the same way as ˜user and others the same way as '˜user'. Since tilde expansion itself is not completely portable, this has little effect on portable scripts.

Quoting in Parameter Substitution

A number of parameter substitution forms contain embedded strings. The quoting rules for these are not entirely portable. In general, omit quotes in these strings and rely on quoting around the substitution. If you need to escape a dollar sign or similar character in a literal, use a backslash. If you want to prevent globbing, quote the whole substitution, not just the right-hand side.

The examples in Table 6-3 assume two variables, $a and $e; $e is unset and $a holds an asterisk.

Table 6-3. Trying to Predict Shell Expansion

Expression Output
${e} Empty string
${a} Expansion of glob *
${e:-$a} Expansion of glob pattern *, except in zsh where it is literal
"${e:-$a}" *, except in ash, which expands the glob
"${e:-*}" * expression
"${e:-"*"}" *, except in ksh93, which expands the glob
"${e:-"$a"}" *, except in ksh93, which expands the glob
"${e:-$a}" $a
${e:-'$a'} $a
"${e:-'$a'}" '*', except in pdksh, which gives $a
'${e:-'$a'}' ${e:-*} as a glob pattern, except in zsh, where it is literal

To make a long story short, it is hard to predict the behavior of nested quotes in variable substitution. Avoid this as much as you can. However, be aware that you may need quotes to use assignment substitution. The following code does not work in some older shells:

$ : ${a=b c}
bad substitution

To work around this, you can quote either the right-hand side of the assignment or the whole operator. Quoting the word only is more idiomatic:

$ : ${a="b c"}

In the preceding example, if $a has a value, that value is expanded outside of quotes, but if it did not have a value, the assigned value is in quotes:

$ sh echoargs ${a="b c"}
b c
$ sh echoargs ${a="b c"}
b
c

Trying to predict this behavior is essentially futile; there are simply too many specialized bugs or special cases. In general, the interactions between assignment substitution and other quoting rules make it best to use this substitution form only as an argument to : commands, not in cases where you have any expectations about the substituted value.

The POSIX expansion forms using pattern matching (discussed in Chapter 7) treat the pattern as unquoted by default, so you must quote pattern characters in them. As you can see, this behavior may be hard to predict consistently. Backslashes are usually safe for escaping single characters.

A Few Brainteasers

While all of the shell's rules are individually comprehensible, it is easy to think so hard about one of the shell's quoting or substitution behaviors that you forget about another one. This section gives a handful of code fragments that have surprised me or other people I know, resulting in confusion about why a given shell fragment didn't work or even confusion about why it did.

$ echo $IFS

I am a little ashamed to admit that I've used this several times to try to debug problems with the shell's behavior. It seems perfectly sensible, and if you think $IFS is unset or contains only whitespace, it even does what you expect. The problem is that unquoted parameter substitution is subject to field splitting. This means that any characters in the value of $IFS that are found in the value of $IFS are taken as field separators. If a word expands to nothing but field separators, there is no word there; all this does is pass no arguments to echo, producing a blank line. You wouldn't think it surprising that the characters in $IFS are in $IFS, but the habit of using echo $var to display a value is pretty well set in many shell programmers.

$ a=*
$ echo $a

This fragment clearly shows that the shell performs globbing on variable assignment; after all, $a was set to a list of file names, right? In fact, it is quite the opposite; $a was set to *, but since the substitution isn't quoted, the results are globbed.

The next example shows a case that seems surprising if you don't know that field splitting does not occur in an assignment operation. Most shell users are familiar with the problem of trying to assign multiple words to a variable:

$ a=one two
sh: two: command not found
$ echo $a

$ a="one two"
$ b=$a

The second assignment does not need quotes; there is no field splitting in the assignment. However, you will see quotes used there quite often, mostly by people who have been burned by trying to assign a list of words to a variable without quotes. This is the big difference between word splitting (tokenizing) and field splitting. An assignment must be a single word, so if it is to contain spaces, they have to be quoted. However, once the assignment is identified, the right-hand side is substituted without any field splitting or globbing.

case $var in
"*")
  echo "*";;
*" "*)
  echo "* *";;
*)
  echo "anything else";;
esac

The case statement has two interesting special cases, if you'll pardon the term. The control expression is not subject to field splitting or globbing. The pattern expressions are stranger still. Shell pattern characters in the patterns are obviously treated as pattern expressions (rather than globs) when unquoted. To get literals, you must quote them. However, other shell characters may need to be quoted; the quotes in *" "*) are needed, or the script becomes a syntax error. This is understandable if you think of the abstract form of the syntax:

case expression in
word) commands ;;
esac

All you have to do is remember that each test expression has to be a single shell word at tokenizing time; it is not subject to field splitting or to globbing.

Debugging Tools

This section is, of course, not very important. Your scripts will work on the first try because you are paying very careful attention to all the wonderful advice on how to write good code. Perhaps you even have flow charts. However, on the off chance that you might sometimes find a script's behavior a little surprising, a discussion of debugging tools is called for.

The shell's trace mode (-x) is fairly close to a debugging tool, but it is, unfortunately, fairly limited. All it can show you is actual simple commands as they are executed; control structures are not shown. The verbose flag (-v) shows the shell's input as it is read, but this doesn't show you the flow of control.

It is sometimes useful to display commands before executing them, but the usual mechanisms work only for simple commands or pipelines. If you have a variable containing a command, you can display it easily enough with echo "$command". However, you cannot necessarily execute it and get the results you expect. If you simply use the variable as a command line, any shell syntax characters or keywords will be ignored; if you pass it to eval, however, a whole new pass of shell substitutions and quoting takes effect, possibly changing the effect of the command. Each of these conditions may prevent you from using this technique generically, but in the majority of cases, it can be used.

To debug shell scripts, you must use a variety of tools, depending on the problem you are having. You can generally start with trace mode to see at least where in the script things are acting up. Once you have isolated the approximate location, inspection is often enough to reveal the bug. When it isn't, you will need to use additional code to figure out what the shell is doing. For instance, if you have a case statement, trace mode will not show you what branch it takes, but seeing the code executed may tell you what you need to know. If not, start by displaying the value you used to control the case statement right before executing it.

Sometimes, especially with a larger script, reproducing a problem can take a long time per run. You can copy chunks of code out of your script to see what is happening; for example, if you have a misbehaving case statement, first modify the script to display the control value, then copy the case statement into a temporary file and change the contents of the branches to display which branch is taken. The temporary file can be run as a miniature script.

When you are debugging a script, be aware of enhancements or local features a given shell provides. While you should stick to portable code for the final version, sometimes an extension can be extremely useful for debugging. For instance, bash offers special traps like DEBUG, which lets you run a trap before every shell command. This can be very useful for tracking a shell variable that is getting changed unexpectedly. The DEBUG trap is also available in ksh, but not in pdksh; in ksh93, it also sets the parameter ${.sh.command} to the command that is about to be executed.

In general, debugging in the shell is not all that different from debugging in any programming language, although the tools available are generally more primitive. For a really difficult bug, you may wish to look into the bashdb debugger, which works only with bash but offers a variety of useful debugging tools for interactive debugging of scripts. A similar debugger exists for ksh and was introduced (with source code) in Learning the Korn Shell (2nd Edition) by Bill Rosenblatt and Arnold Robbins (O'Reilly, 2002).

Focus on developing a way to reproduce the bug reliably, isolating it by removing irrelevant components, and you should be able to track the bug down.

What's Next?

Chapter 7 explores the portability of shell language constructs and introduces a few common extensions that you may find useful in more recent shells. It also discusses ways to identify which shell a script is running in, and possibly find a better shell if the shell you've been given isn't good enough for you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.40.32