CHAPTER 3
Basic Shell Scripting

This chapter introduces the basics of control flow in the shell. The shell's functionality is moderately baroque, and many shell features have elaborate interactions. This chapter glosses over the full (and rather gory) details of the shell's quoting and variable expansion features, leaving them for Chapter 4. Instead, this chapter introduces the basic programming features of the shell, showing how to control the execution of shell scripts, join programs together, and interact with files. This framework makes it much easier to provide meaningful examples while exploring the rather more complicated territory of the shell's expansion and quoting mechanisms.

Scripts presented without command prompts may be run directly on the command line or saved in a file and run as scripts.

Introducing Control Structures

By default, the shell executes commands in the order it encounters them, whether on the command line or on the keyboard. Certain inputs, however, instead of having direct effects of their own, cause the shell to change which commands it executes and in what order; these are called control structures. Control structures are what make the shell a programming language, rather than a very simple macro expansion language.

There are several kinds of control structures. Conditional execution causes the shell to execute some code, while skipping other code. This allows a script to adapt to different circumstances; for instance, a script might wish to ask a user for confirmation before taking a risky action. Iteration allows a script to run a given block of code more or fewer times. A typical example would be a program that performs the same operations on every file in a directory or on each line of input.

Control structures are sometimes used even when their function could be obtained without them; for instance, you might write a loop to perform a given task five times, rather than simply duplicating the code for that task five times. This makes it easier to generalize later (if the number of times you want to repeat the task varies, for instance) and also makes it easier to maintain code. This can also be done using shell functions, a feature introduced in Chapter 5.

In both cases, shell control structures depend on testing conditions. To make a decision about what to do, the shell has to be able to express the concept of a yes or no question; the shell has to have a concept of truth and falsehood, whether the question is "did the user say yes?" or "are there any more files?"

What Is Truth?

In nearly every programming language, most control structures come down to tests and the concept of whether a condition "is true." In the shell, control structures are based on the exit status of commands. Every program that is run on a UNIX-like system has a numeric exit status (or return code), which indicates something about the final state of its execution. Two programs provided on all UNIX systems, true and false, are guaranteed to always yield a true or false exit status, respectively. The : built-in command always produces a true exit status. Under the hood, the return code is zero for a successful command execution, and non-zero for a command that is reporting any kind of failure or abnormality. This conflicts with the common convention in C-like languages of executing code inside if (1) and not executing code inside if (0). Many shell programmers use the : command, which is a synonym for true. I like the natural language form, but the use of : is quite common, too. It has advantages. While only some shells provide true as a built-in, : is a built-in command in every shell, making it more efficient. It is also shorter to type and to read. (And for those of you targeting minimal embedded systems, : works when /bin/true is missing. This is less important with /bin/false; if it is missing, the execution fails, so the false command always fails.) The examples in this book are written more for clarity than for performance, in this respect.

While people often think of control structures as applying only within script files, the shell happily accepts control structures typed directly on the command line. For instance, you can verify the behavior of true and false on the command line:

$ if true
> then echo "True!"
> fi
True!
$ if false
> then echo "False!"
> fi
$

The exit status is not the output of the program; it is a separate piece of data made available to the calling program, such as the shell. The true command doesn't print the zero value, it simply makes that value available to the program calling it.

For now, I ignore the question of whether commands are built in or external. It turns out not to matter; built-in commands produce a return code, just as external commands do, and use the same conventions. If you are curious about the return code of a command, you can echo the built-in shell parameter $? immediately after running it. After a successful command, this value will be 0. After an unsuccessful command, it will typically be non-zero. Standard POSIX shells have a feature where any command can be prefixed with !, reversing the return code of that command. For instance, the echo command usually succeeds, but ! echo hello performs the echo successfully, then yields a return code indicating failure. Unfortunately, a few shells omit this feature; in code that has to run on /bin/sh on every common system, it is best to avoid it. Here is an example of how to test for the function of the ! command prefix:

if eval "! false" > /dev/null 2>&1; then
  echo "This shell supports !"
else
  echo "This shell does not support !"
fi

This shell supports !

There are only a couple of shells (most notably /bin/sh on Solaris) that will run into this. You can replace ! command with a construct like this:

if command; then false; else true; fi

One command is particularly important—the test command, which can perform a variety of logical tests, such as comparing numbers or strings, or testing attributes of files (such as whether they exist, have contents, or are accessible). Unlike many commands, the test command generally produces no output at all; rather, it indicates success or failure only through its return code. For historical reasons, and also because it looks pretty, the test command has an alias of [, which expects a trailing ] after its arguments. If you have ever wondered why there is a file /bin/[, now you know. However, this variant can not be safely used in shell code being used with m4sh or autoconf, so it is a good habit to use the plain test form.

When expanding variables as arguments to the test command, be careful about what could happen with variables whose expansions look like parts of the test command's argument grammar. While most of the time the test command figures out what was intended, it can be easier for everyone to ensure that arguments are unambiguous. A common idiom for this is to precede arguments with X where possible, as in the following example:

if test X"$answer" = X"42"; then
  echo "Forty-two!"
fi

There are three key points to this idiom. First, putting a letter in front of the variable ensures that, even if a user enters something like = or -f, test treats the argument as a plain string, not an operator. Putting quotes around the variable ensures that it will not be split into multiple words. Finally, putting the X outside the quotes on both sides makes the intent clearer. The user can easily see that the X is the same on both sides. Another common idiom is to reverse the positions of the values. This eliminates possible ambiguities, presenting test with an expression that can only be understood as intended:

if test 42 = "$answer"; then
  echo "Forty-two!"
fi

Of these two, I prefer the X form, simply because I find it easier to read "answer equals 42" than "42 equals answer." This is purely a style question; either is portable.

The test program performs string comparisons by default; the expression test 1 = 1.0 is considered false because the strings differ. However, it also supports numeric comparisons, which are spelled as hyphenated operators, like -eq; test 1 -eq 1.0 is true. Numeric comparisons are needed because string comparisons pay no attention to magnitude; test 100 < 2 succeeds because the digit 1 is before the digit 2 in standard character sets. Table 3-1 shows the Relational Operators in test

Table 3-1. Relational Operators in test

String Operator Numeric Operator Meaning
a = b a -eq b a and b are equal
a != b a -ne b a and b are not equal
a > b a -gt b a is greater than b
a < b a -lt b a is less than b
a >= b a -ge b a is greater than or equal to b
a <= b a -le b a is less than or equal to b

Developers from other languages should note two distinctions. The first is that the == equality operator is not portable, although some variants support it as an extension. The second is that Perl precisely reverses the sense of these operators; in Perl, == is the numeric equality test, and eq is the stringwise one.

The test command supports a number of logical operations allowing you to combine or invert tests. First, any test can be preceded by ! to reverse the sense of the test. This is portable among shells and implementations of test, even in shells that do not allow commands to be preceded by !.

Some versions of test allow combinations of multiple tests, conjoined with -a (and) or -o (or) operators. This is not fully portable; instead, use the shell's && and || operators (which are explained in Chapter 4).

Introducing Conditional Execution

There are two primary mechanisms for conditional execution in the shell. The first is the if statement, which executes code if a specified condition is true. The second is the case statement, which can select among multiple sections of code based on the contents of an expression. In both cases, only one section of code is actually executed, and others are completely bypassed.

Introducing the if-then-else Statement

The if statement executes code if a specified command succeeds. The syntax of the if statement follows this basic pattern:

if command; then
  actions
fi

The use of fi, rather than something generic like end, reflects the original shell developer's fondness for ALGOL. The then part of the statement does not need to be on the same line as the if; in fact, it must be separated by a command separator (the semicolon in the previous example). Some users prefer to write if statements as follows:

if command
then actions
fi

In this book, I use the first structure, but they are equivalent. A simple program to check whether the reader can perform simple arithmetic could be implemented as follows:

printf "What do you get if you multiply 6 by 9? "
read answer
if test X"$answer" = X"42"; then
  echo "You read too much science fiction."
fi

If the user enters 42, the shell counters with a reference to a popular novel. But if the user enters anything else, the shell just says nothing. You could resolve this by checking for other values:

printf "What do you get if you multiply 6 by 9? "
read answer
if test X"$answer" = X"42"; then
  echo "You read too much science fiction."
fi

if test X"$answer" != X"42"; then
  echo "You do not read enough science fiction."
fi

As you can see, this has the potential to become large and unwieldy rather quickly. Furthermore, if some clever sort comes along and "corrects" the value used, it's quite possible that one of the statements will be changed, and the other will not, leading to inconsistent or unwanted behavior. Luckily, the shell has another keyword that may be used in if statements: else. The else clause of an if statement, if present, is executed if the specified command indicated failure. A more idiomatic implementation of the previous script would be as follows:

printf "What do you get if you multiply 6 by 9? "
read answer
if test X"$answer" = X"42"; then
  echo "You read too much science fiction."
else
  echo "You do not read enough science fiction."
fi

As with then, else may be placed on the same line as the following actions; however, this is often harder for the reader to understand. Conditional statements may be nested arbitrarily, as well:

printf "What do you get if you multiply 6 by 9? "
read answer
if test X"$answer" = X"42"; then
  echo "You read too much science fiction."
else
  if test X"$answer" = X"54"; then
    echo "Boring, but arguably correct."
  else
    echo "You do not read enough science fiction."
  fi
fi

This works well as long as there are not too many alternatives, but imagine for a moment a test to determine whether the user has entered a valid state or province name using this pattern. Clearly, something more flexible is needed. One method is to use the optional elif test:

printf "What do you get if you multiply 6 by 9? "
read answer
if test X"$answer" = X"42"; then
  echo "You read too much science fiction."
elif test X"$answer" = X"54"; then
  echo "Boring, but arguably correct."
else
  echo "You do not read enough science fiction."
fi

Any command may be used as the controlling expression for an if or elif statement. Since most UNIX commands indicate their status in their return code, this can also be used for error detection during a script's execution. Most programs print their own error messages, but sometimes the output from a program would not be informative to the user.

if grep $user /etc/passwd; then
  echo "$user is already in /etc/passwd."
fi

seebs:x:1000:1000:Peter Seebach,,,:/home/seebs:/bin/bash
seebs is already in /etc/passwd.

The first line of output is the output from the grep command, not the intended error message. There are two ways to resolve this. One is to use the -q (or -s) command-line flag to grep; this suppresses output, causing grep to indicate success or failure only through its exit status. Unfortunately, these flags, while widespread, are not universal; some implementations support one, some the other, and some neither. The portable solution is to redirect the output of the command:

if grep $user /etc/passwd >/dev/null; then
  echo "$user is already in /etc/passwd."
fi

The output is redirected to /dev/null, preventing the user from seeing it. (Redirection is explained in the "Introducing Redirection" section later in this chapter.)

When you need to store a user preference or other decision, the simplest idiom is to store either true or false in a variable, then use the variable as a condition:

do_this=true
do_that=false
if $do_this; then
  echo "Do this."
fi
if $do_that; then
  echo "Do not do that."
fi

Do this.

This idiom is easy to read and runs efficiently. You can also store values such as Y or N in a variable and test for them using the test command, but using true and false is simpler and cleaner. Implemented with string values and the test command, the previous example becomes:

do_this=Y
do_that=N
if test "$do_this" = "Y"; then
  echo "Do this."
fi
if test "$do_that" = "Y"; then
  echo "Do not do that."
fi

Do this.

The behavior is the same, but the code is harder to read. Experienced programmers may prefer to use : and false for brevity or performance reasons.

You can test for patterns as well. While there is no portable way to match patterns or regular expressions using test (the regular expression operator is not universal), the expr command can be used to compare strings to regular expressions:

if expr "$do_this" : "[Yy].*"; then
  echo "Do this."
fi

In some cases, you will find that the if and elif constructs are not as expressive as you would like for a given problem, and what you really want to do is compare a string against a series of possible patterns, not just against an individual pattern. There is a way to do just that.

Introducing the case Statement

The case statement compares a string to a series of patterns. One of the advantages of this is that you can have multiple different tests without an ever-increasing indentation spiral of doom. One of the disadvantages is that, while expr tests regular expressions, case tests only shell patterns. However, shell patterns with alternation are flexible enough to serve well. The basic layout of a case command looks like this:

case word in
  pat1) actions;;
  pat2) actions;;
esac

As with if, the case command is ended by its own name, spelled backward. There may not be spaces between the two semicolons that terminate each list of actions. The value of word is expanded but is not subject to field splitting after expansion. So if word is just a single variable, you never need quotes around it. (The only time you could need spaces would be if word contains spaces prior to expansion; the value "$a $b" needs to be quoted, as the shell takes only a single word before the in keyword.) The value is checked against each pattern in turn, and the actions from the first matching pattern are executed. Some systems also provide pattern matching in test, but this is nonportable. Use case instead.

Pattern matching is explained in more detail in Chapter 2. This section assumes some familiarity with pattern matching but uses simple patterns to illustrate how the case statement works. For instance, the following test implements a draconian user interface policy:

printf "Would you like to play a game? (please enter yes or no): "
read input
case $input in
  yes) echo "I would like to play a game too, but I am only a sample script.";;
  no) echo "I am very disappointed.";;
  *) echo "I said to please enter yes or no. Now formatting your disk...";;
esac

User input that contains "yes" or "no" plus other contents will not pass muster in this example. For instance, if the user entered "yes, please," the script would not consider this valid input. A more forgiving writer might use something similar to the following:

printf "Would you like to play a game? (please enter yes or no):"
read input
case $input in
  [Yy]*) echo "I would like to play a game too, but I am only a sample script.";;
  [Nn]*) echo "I am very disappointed.";;
  *) echo "I said to please enter yes or no. Now formatting your disk...";;
esac

Additionally, it is permissible to provide multiple patterns for a single case, separating them with pipe characters (|). For instance, the following script accepts a number of variants but is not quite as general as the preceding example:

printf "Would you like to play a game? (please enter yes or no): "
read input
case $input in
[Yy]|[Yy][Ee][Ss]) echo "Me too, but I am only a sample script.";;
[Nn]|[Nn][Oo]) echo "I am very disappointed.";;
*) echo "I said to please enter yes or no. Now formatting your disk...";;
esac

This accepts y or yes in any combination of capitals, or n or no in any combination of capitals, but it will not recognize other inputs. If there is no way to manage what you want using shell patterns, you may have to fall back on if statements and expr or grep. See Chapter 2 for more information about patterns and regular expressions.

Between if and case, you can control the behavior of a great number of programs; but if you stop there, you will shortly notice that programs that need to repeat actions become very tedious to write and maintain, even with a modern text editor to handle your cut and paste needs. What you need is a way to do the same thing over and over, without getting bored; this brings us to iteration.

Introducing Iteration

The real strength of the shell (or of anything computers do) is not in doing a single thing, but in doing similar things over and over. The shell provides two primary mechanisms for iteration. The while loop (and its relative, the until loop) repeat as long as a condition is true (or false). The for loop iterates over a fixed list of items, processing each item once.

The while Loop

The simplest loop in the shell is the while loop, which performs a series of actions as long as a condition remains true. The basic syntax is this:

while command; do
  actions
done

As with the if statement, command can be any shell command. The actions can be a shell command or a sequence of shell commands. If command indicates failure, the shell leaves the loop. For instance, if command fails the first time the shell executes it, the actions are not performed even once. Otherwise, after each time performing actions, the shell runs command again. For instance, Listing 3-1 might induce a positive frame of mind.

Listing 3-1. Positive Thinking Made Easy

while test X"$answer" != X"yes"; do
  printf "Say yes: "
  read answer
done

This loop runs until the variable $answer contains the string yes. The variable is not initialized prior to the loop; assuming it wasn't already set somewhere else in the script, it simply expands to an empty string until the user supplies a response to the read command. It is not an error to use an uninitialized variable in the shell (but you can check for a value; see the discussion of variables in Chapter 4).

You can make the code inside the loop as complicated as you want, including using other features such as conditional execution, as in the following example:

while test X"$answer" != X"YES"; do
  printf "Are you ready? "
  read answer
  if X"$answer" = X"yes"; then
    echo "I can't HEAR you!"
  elif test X"$answer" != X"YES"; then
    echo "When I ask you a question, you say YES!"
  fi
done

For convenience, the shell also offers an until loop, which is precisely like a while loop, except the sense of the condition test is reversed. For instance, Listing 3-1 would be written as follows using until:

until test X"$answer" = X"yes"; do
  printf "Say yes: "
  read answer
done

Some writers feel that the until loop adds substantial clarity, but others dislike it. I recommend that you use it when it seems clearer. If a natural language description of the process would start with "do X until ... ," then use until.

Introducing break and continue

Sometimes, you may find out early in a loop iteration that you cannot usefully continue. The shell only checks command at the top of the loop, though; the shell will not stop the sequence of commands halfway through just because command would indicate failure if run again. To escape the loop immediately, use the break command. In some cases, this is especially useful when combined with the true program, which always succeeds. For instance, you may want to ensure that a loop is run at least once (C programmers may be familiar with this as the do {} while () idiom). There is no explicit syntax for this. Idiomatically, you run an eternal loop, and break when the loop condition is no longer true.

while true; do
  printf "Say yes: "
  read answer
  if test X"$answer" != X"yes" ; then
    echo "Oh, come on now. You can do it!"
  else
    echo "You did it! Way to go!"
    break
  fi
done

Another possibility is that, while this particular iteration of the loop has lost interest for you, you wish to continue iterating. For this, you use the continue statement, which jumps back to the top of the loop. The continue statement jumps to the iteration test; if that test now fails, the loop exits.

Note that break and continue only have meaning within loops, such as while or for, not in terms of if or case statements. The break statement in this example skips out of the while loop. If you want to break out of more than one loop, the break and continue statements take an optional argument indicating how many nested loops to break out of.

while true; do
  printf "Are you bored yet?"
  answer=""
  while test X"$answer" != X"yes" && test X"$answer" != X"no"; do
    read answer
    case $answer in
      no) ;;
      yes) echo "I never liked you either."
          break 2;;
      *) echo "I am but a humble script, and only understand yes and no.";;
    esac
  done
done

The preceding example uses break 2 to leave both the inner loop (waiting for an answer it understands) and the outer loop. If the inner loop used only a plain break statement, it would jump to the end of the inner loop, but the outer loop would continue to iterate.

Introducing for loops

In the shell, the for loop does only one thing: iterate over a provided set of arguments. This is not analogous at all to the C for loop, which can iterate over essentially any circumstance. Perl users may be familiar with it as foreach. The basic form of the for loop is this:

for var in list; do
  actions
done

The value provided for list is subject to parameter substitution followed by globbing, and then subject to field splitting (unless it is quoted). The for loop runs once for each member of list, assigning that value to var. For instance, the following script looks almost like a very simple mail-merge program:

for name in "Occupant" "Our Friends" "Current Resident" "Postal Customer"; do
  echo "Hello, $name"
  echo "Look! A personalized letter! Buy our stuff!"
done

Because the most common usage of the for loop is to iterate over the arguments given to a script, there is a special syntax to do this. If the in list is omitted, the shell iterates over the arguments given to the script. (Actually, it iterates over the positional parameters, which are usually but not always the arguments given to the script; the positional parameters are discussed at length in Chapter 6.) However, this produces a rare portability issue; some older versions of bash do not cope well with a semicolon after the variable name. When using for without in, put a new line before the do keyword. For instance, the following script identifies the first file in its arguments that contains a given string:

string="test"
for i
do
  if grep "$string" "$i"; then
    echo "$i"
    break
  fi
done

As you can see, break works the same way in for loops that it does in while loops. This script has a number of flaws, but it can be used. The most obvious flaw is the display of the unneeded output from grep. Another flaw is that, if $string expands to something that looks like a grep option, the script misbehaves; similarly, some versions of echo may behave surprisingly with file names that have hyphens or backslashes in them. The following script is a little cleaner:

string="test"
for i
do
  if grep −e "$string" "$i" >/dev/null; then
    printf "%s " "$i"
    break
  fi
done

The −e option to grep specifies that the following argument is the expression to match, even if it might otherwise be interpreted as an option. The printf command displays the file name no matter what it is, suppressing the strange and unportable behavior of echo.

Thinking About Control Structures

The preceding introduction to control structures is not a substitute for using them frequently to get comfortable with them, but it should be complete enough to let you understand the sample programs used to illustrate other features. It takes some experience to know when to use the different control structures, and the best way to develop a good sense for this is probably to look at, and write, lots of examples. If you are new to programming, this is usually one of the hardest parts to get used to.

The shell control structures have names that describe their behaviors. You can usually decide which one to use by trying out verbal descriptions such as "for each file" (a for loop), "while there is more data" (a while loop), or "if the file exists" (an if statement). The case statement is the hardest to map to idiomatic English, but if a description of what you are doing can be phrased starting with "in the first case," it is probably going to map well onto case.

Some of the most powerful uses of these constructs depend on the use of additional tools. One of the most crucial of these is the ability to change the sources of input, and the destinations of output, of shell programs.

Introducing Redirection

In most cases, when you are using a shell interactively, commands accept input from your keyboard and direct output to your screen. When you run a script from an interactive session, it works the same way. UNIX systems treat files, keyboards, and other data sources in essentially the same way, calling them all streams. A stream is simply a source of data, or a place data can be written to. Some streams can be both read from and written to. Changing the source of a program's input, or the destination of its output, is called redirection. This section provides an introduction to redirection, although there are additional features to be explored later.

The examples in this section are often presented as interactive sessions, with user input in bold and shell prompts and output in plain text.

Here's an example of redirection in action:

$ echo "hello, world" > hello
$ cat hello
hello, world

This differs from the direct echo in that a new file, named hello, has been created. The output of echo is redirected into the file. When redirecting to a file, the shell empties the file first. If you want to add on to the existing contents, use >>, as in the following example:

$ echo "goodbye, now" >> hello
$ cat hello
hello, world
goodbye, now

A particularly common redirection target is the special file /dev/null. This special file is not a regular file storing data, but a special file that simply discards anything and everything written to it. For instance, the for loop example emitted unwanted output, until it was eliminated by the use of redirection to /dev/null; here it is again:

for i
do
  if grep −e "$string" "$i" >/dev/null; then
    printf "%s " "$i"
    break
  fi
done

Without the redirection, the user sees all of the matching lines in each file, followed by its name. This is annoying, but there is no portable way to tell grep not to produce any output. What you can do portably is redirect that output, discarding it; then, the script produces only the file names, rather than the grep output and the file names.

Similarly, commands can be run using a file as input instead of the keyboard, using a < for redirection. For instance, you might want to run one command on the output of another, like this:

$ ls > list
$ grep hello < list
hello

In this example, a complete list of files in the current directory is stored in a new file named list. Then, the grep command is used to display lines in that file containing the string hello. This is inefficient, though; you have to remember to clean up the intermediate file, and if the output is large, it takes up a lot of space. UNIX solves this with pipes. A pipe is a single stream that provides output from one program as input to another. For instance, the following command displays every file in the current directory with hello in its name:

$ ls | grep hello

On some (non-UNIX) systems, a similar syntax is available, but the shell implements it by writing the output to a temporary file, running the second program on that file, and then deleting the file. On UNIX, both commands can run simultaneously.

A series of commands joined by pipes is called a pipeline, and in general, a pipeline can be used in any case where a single command could be used. A pipeline can have more than two commands. This command displays a count of files in the current directory with hello in their names:

ls | grep hello | wc −l

The exit status of a pipeline is the exit status of the last command in it. So the first example can be used, combined with redirection, to create a simple test:

if ls | grep hello > /dev/null; then
  echo "you have a file with hello in its name."
fi

The output from the ls command is fed into grep as input. The grep command then prints any matching lines, but its output has been redirected to /dev/null. However, the grep command's exit status is success when it finds at least one match. So, without actually looking at the output, the shell can still tell whether grep would have printed anything, and thus whether there were any matching lines. Note that it does not matter at all what exit status the ls command yields; only the exit status of grep is being used by the shell. Thus this script won't work:

if ls | grep hello | wc −l > /dev/null; then
  echo "you have a file with hello in its name."
fi

Whether grep produces any output or not, the wc (word count) command is unlikely to fail. This highlights the difference between the output of the command and its return code. If there are no matching files, wc -l prints 0 to standard output; if there are matching files, it prints the number of lines it received as input. However, its return code will be zero as long as no errors occurred. Because wc is the last program in the pipeline, it determines the return code of the whole pipeline.

Understanding File Descriptors

The discussion so far has talked about input and output streams, but it has not mentioned any other streams. UNIX programs usually start with three streams: standard input, standard output, and standard error. Standard input reflects the input to the program, whether that is a terminal, another program, or a file. Standard output is where the program's output goes, while standard error is a separate stream used for error messages. When standard output is redirected to a file or to another program, standard error is unchanged:

$ grep string nonexistent-file > /dev/null
grep: nonexistent-file: No such file or directory

If you are running in an interactive session, standard error is usually your terminal. As another example, when a CGI script is being run by a web server, it is common for standard output to be the eventual web page to be presented to the client and standard error to go into the web server's log files.

Streams have associated numbers, called descriptors. Standard input is always descriptor 0, standard output is descriptor 1, and standard error is descriptor 2. By default, output redirection redirects descriptor 1, and input redirection redirects descriptor 0. So, in this example, standard output is redirected, but standard error is not:

$ ls nonexistent-file > output
ls: nonexistent-file: No such file or directory

The file named output is created, but empty, because the ls command did not send any messages to standard output. You can specify the descriptor to redirect explicitly:

$ ls nonexistent-file 2> error
$ cat error
ls: nonexistent-file: No such file or directory

The ls command produces no messages to standard output, and its error messages are directed into the file named error. However, if the file did exist, the error file would be empty and the file name would be displayed; standard output has not been redirected. This kind of technique can be used to defer the display of an error message, or prefix it with some kind of explanation. Consider the preceding example with a loop calling grep. You might want to defer those messages or suppress them entirely:

string="test"
found=0
show_errs=true
for i
do
  if grep "$string" "$i" >/dev/null 2>error
    printf "%s " "$i"
    show_errs=false
    break
  fi
done

This writes any errors it encounters into a file named error; a more robust script would use a temporary file with a name that is not likely to clash with a user-created file. However, you need a way to report these errors. If there were errors encountered before a matching file showed up, it is undesirable to follow the file with errors. Thus the show_errs variable is created to indicate whether to display errors. It is used as follows:

if $show_errs; then
  echo "Couldn't find '$string' in any files."
  if test −s error; then
    echo "Errors were encountered:"
    cat error
  fi
fi

If the show_errs value still contains true, no matches were found, and it is useful to display an error message to the user. The if statement becomes if true; then, which executes the conditional code. On the other hand, if matches were found, show_errs has been set to false, and the conditional code is not executed; there is no reason to warn the user about possible errors reading other files when the file the user cared about was read successfully.

In the case where no matches were found, any error messages from the grep commands might be relevant, so they should be displayed. This is conditional on the test -s command, which checks that a file exists and has contents. Unfortunately, there is a subtle bug; since each call to grep is redirected separately, the file contains only any errors produced by the last file. Each run through the loop empties the error file before running grep. One solution would be to use >> redirection, but there is a simpler way. Redirection can be applied to any shell command, not just individual statements. Redirecting the whole loop truncates the file only once, at the start of the loop, and accumulates all of the errors. Another is to use a more robust name for the temporary file. That gives you the following improved script:

string="test"
found=0
show_errs=true
error=${TMPDIR:-/tmp}/err.$$
for i
do
  if grep "$string" "$i" >/dev/null; then
    printf "%s " "$i"
    show_errs=false
    break
  fi
done 2>"$error"
if $show_errs; then
  echo "Couldn't find '$string' in any files."
  if test −s "$error"; then
    echo "Errors were encountered:"
    cat "$error"
  fi
fi

Just as the shell restores the previous streams after redirecting a single command, it restores the previous streams after redirecting a compound command. The TMPDIR environment variable, when set, is used to hint at a location other than /tmp in which to store temporary files. By convention, temporary files usually embed the shell's PID in their names to avoid clashes. Each use of the file name error has been changed to "$error". The quotes protect the script in the event that someone has set TMPDIR to a name including spaces or new lines, which could otherwise cause the shell's field splitting to render the script syntactically invalid.

However, this script now has a serious bug. Its output is only sometimes the name of the first file containing the string. If there were no such files, its output is an error message. This requires any program using the output from this program to be more careful. What you want is some way to distinguish between the output of a script and diagnostic messages about it. And, as it turns out, that is exactly what standard error is for. Thus the following cleaned up version does the right thing:

string="test"
found=0
show_errs=true
error=${TMPDIR:-/tmp}/err.$$
for i
do
  if grep "$string" "$i" >/dev/null; then
    printf "%s " "$i"
    show_errs=false
    break
  fi
done 2>"$error"
if $show_errs; then
  echo "Couldn't find '$string' in any files."
  if test −s "$error"; then
    echo "Errors were encountered:"
    cat "$error"
  fi
fi >&2

As with the redirection of the for loop, the entire trailing if statement can be redirected. The redirection >&2 redirects the output of the if statement to standard error; this technique is explained further in a few paragraphs. If this command is used in a pipeline with another command which expects to receive the name of a file, the second command will get either a file name or nothing; this prevents the second program from trying to find a file named Couldn't find 'test' in any files.

In most cases, this is a useful feature. However, separating output and errors is not always desirable. In some cases, such as running large software builds, it is common to want to put standard output and standard error together in a single file. For instance, if you wanted both the output of a build and any error messages stored in a log file, you might try this:

make >log 2>log

This does not work as intended. The first redirection creates a file named log, truncates it, and starts writing output to it. The second opens the same file and starts writing to it. Unfortunately for you, this means that the output and error streams can overwrite each other because they are each writing separately to the same file. Each redirection has created a separate stream going into the same file, and each stream has its own notion of where in the file it will write next. What you want, however, is to have a single stream that both standard output and standard error appear in. The shell has a special syntax for this, allowing any file descriptor to be copied (also called being cloned or duped) from any other file descriptor:

make >log 2>&1

The ampersand (&), in this context, indicates cloning of an existing descriptor rather than opening of a file by name. Thus standard output is redirected into a file named log, and then standard error is redirected to wherever standard output goes—in this case, the file named log. The two descriptors are now both attached to the same stream for the duration of the redirection. Note that, although only a single > is used, duplication of a file descriptor does not truncate anything; it is not opening the file, but copying the already open stream to a new descriptor. As with any other redirection, this is temporary, and after the command exits, the descriptors go back to their original, separate streams. There are a number of other cases where this technique can be used, but joining standard error to standard output is by far the most common. The same technique can be used for input streams using <&.

Throughout this section, redirections have always been shown at the end of a command line. In fact, redirections can occur anywhere in a command line. Redirections are processed separately from arguments and are not visible to the command being run. In general, redirections are processed from left to right. However, in the case where a command is in a pipeline, redirecting standard error to standard output has the effect you probably want—standard error is merged into the pipeline.

Redirection Using exec

One other use of redirection is common enough to be worth mentioning. It is incredibly tedious to run a large number of commands all with the same redirection appending their output to a file. The shell allows you to redirect the shell's file descriptors, rather than just the file descriptors of a particular command, using the exec shell built in. If you call exec with some redirections, but no other arguments, it redirects those streams within the shell itself. Be very careful to do this only within scripts; if you do it on the command line, you can quite thoroughly hose your shell session. (This is a technical term.) Redirecting the shell's descriptors means that all future commands run by the shell will be affected by these redirections. For instance, the following line in a shell script stores all errors generated by future commands within that script in the file log:

exec 2>log

The exec command can be used to open and close streams. UNIX systems do not use a special character to indicate end of file. In a pipeline, the program receiving data needs to know whether there is more data coming. To distinguish between no data available yet, and no more data coming, UNIX uses a special condition called "end of file", which is not sent as a character on the stream. This means that streams can contain completely arbitrary data; there is no chance of accidentally terminating a stream. To indicate end of file on a pipe, the writer closes the pipe.

Closing files can matter under a number of circumstances, so a discussion of redirection needs to talk about it. The first case where closing occurs is when a program terminates; all of its file descriptors close. So, for instance, in a simple pipeline like ls | grep hello, when the ls command terminates, its output stream is closed. When the grep command finishes reading the data written into the pipe, it detects the end of file on the pipe. If a command is generating data slowly but has not terminated, there is no end of file; UNIX distinguishes between "end of file" and "no data available right now." The following example shows that, even if no data are ever written to a pipeline, it remains open for the duration of a command:

$ sh -c 'sleep 3' | ( date; cat; date )
Sun Jun 15 14:53:17 CDT 2008
Sun Jun 15 14:53:20 CDT 2008

The two date commands show how long it takes for cat to execute, so you don't even need a stopwatch to see how this works.

Secondly, when a descriptor is redirected, the previous descriptor is closed, even if the program is still running. When a redirection is temporary, as with a redirection on a particular command, the original descriptor is saved and is not closed. However, when you use exec to redirect a descriptor permanently, the original descriptor can be closed. Two variants on the previous fragment illustrate the difference:

$ sh −c 'sleep 3; exec >/dev/null' | ( date; cat; date )
Sun Jun 15 14:56:01 CDT 2008
Sun Jun 15 14:56:04 CDT 2008
$ sh −c 'exec >/dev/null; sleep 3' | ( date; cat; date )
Sun Jun 15 14:56:13 CDT 2008
Sun Jun 15 14:56:13 CDT 2008

When the sleep command executes before the redirection, the output pipe does not close until after the sleep command completes. When the output stream is redirected first, the output pipe closes immediately (and there is a three second delay before the shell prints a new prompt).

Between these two rules, you very rarely need to explicitly close a descriptor in shell programming. However, both input and output streams can be closed explicitly using the cloning syntax, giving - as the name of the descriptor to clone. For instance, the redirection 2>&- closes standard error for the command being redirected, and the command exec 2>&- closes standard error for the whole script. The preceding fragments could use >&- just as well as >/dev/null because the script actually produces no output. However, many programs will malfunction if they are run with standard output closed rather than merely directed to /dev/null.

Redirections of individual commands or shell structures are carefully isolated; the shell restores the previous state of its descriptors after running them. However, when you use exec to redirect streams, these changes can have permanent effects. For instance, after the previous command, it may not be possible to restore the previous value of standard error; if standard error was attached to a pipe, there is no way to reopen the pipe.

More complicated shell programs may use a surprising number of redirections to achieve particular goals. For instance, what do you do if you want to run a number of commands, with standard error redirected, then recover the old state of standard error? If you use exec to redirect standard error to a file, the old standard error stream is closed. One solution is to run such redirections in subshells (or functions).

However, there is another way to preserve a stream. If you have more than one descriptor attached to the same stream, the stream is not closed until the last descriptor attached to it is closed. The following fragment illustrates this:

$ sh -c 'exec 5>&1; exec >/dev/null; sleep 3' | ( date; cat; date )
Sun Jun 15 15:10:13 CDT 2008
Sun Jun 15 15:10:16 CDT 2008

The first redirection redirects descriptor number 5 to a duplicate of standard output, after which standard output is closed. You may notice that I have not previously described descriptor number 5. Descriptors numbered 3 and higher are not initially defined or opened, but you can redirect them wherever you want, using the same syntax used for the first three.

This is often useful if you want to temporarily alter your shell environment, preserving the ability to restore it. Much of this can be done by running commands in subshells, but sometimes explicit control is more expressive.

The following script illustrates the use of extra descriptors to control the display of both errors and output:

exec 3>&1                     # stash standard output in descriptor 3
exec 4>&2                     # stash standard error in descriptor 4
exec 1>output.tmp             # send output to output.tmp
exec 2>error.tmp              # send errors to error.tmp
printf "Filename? ">&3        # display message on descriptor 3 (old stdout)
read file
printf "String? " >&3
read string
grep −e "$string" "$file"     # output to output.tmp, errors to error.tmp
status=$?
exec >&3                      # restore standard output
exec 2>&4                     # restore standard error
if test $status = 0; then
  echo "'$file' contained '$string'."
else
  if test −s error.tmp; then
    cat error.tmp >&2
  else
    echo "'$file' did not contain '$string'."
  fi
fi

The control structures at the bottom of the script operate in the original environment, with descriptors 1 and 2 directed wherever they were at the start of the script. While it would generally be ridiculous to do something this elaborate in such a simple case (it would have been much simpler to redirect the output and error streams of the grep command), the principles apply well to larger and more complicated scripts.

There is no real standard for how to use descriptors 3 and higher. Unfortunately, this exposes a weakness of the shell; there is no convenient way to keep track of descriptors. You can mitigate this somewhat by using variables to store the values used for a given function, as in the following example:

exec 3>/tmp/log.txt
logfd=3
log() { echo "$@" >&$logfd; }
log "Hello, world!"
log "All done."

This script emits two lines to /tmp/log.txt. However, this technique is still imperfect. For one thing, it still offers no assurance that some other piece of shell code will not redirect descriptor 3. Secondly, you simply have to be sure to use the same descriptor number in both lines. You might think to try setting the variable first:

logfd=3
exec $logfd>/tmp/log.txt

This fails because redirection is shell syntax, and a redirection operator (such as 3>) cannot result from parameter expansion. You can work around this using eval, though:

eval "exec $logfd>/tmp/log.txt"

If this seems a bit much to keep track of, the m4sh utility (part of GNU autoconf) provides a somewhat automated way to keep track of descriptors and avoid clashes.

Introducing Here Documents

Often, a program needs input that could be read from a file, but creating (and then removing) a small temporary file is awkward or inconvenient. The shell has a special syntax for this, which looks much like the syntax for redirection. A piece of text introduced using a << rather than a < for redirection is called a here document. The here document consists of every following line of input until a special string, which is called a sentinel. It is generally equivalent to creating a temporary file holding those input lines and redirecting input from that file. For instance, the mail merge program might well use a here document:

for name in "Occupant" "Our Friends" "Current Resident" "Postal Customer"
do
  cat <<EOF
Hello, $name
Look! A personalized letter! Buy our stuff! Really!

We are even expanding variables for you, $name!
EOF
done

In this, the <<EOF starts a here document, which continues until a line consisting only of the word EOF. Parameter substitution applies normally within the here document, although globbing and tilde expansion do not. You can embed a dollar sign literally by prefixing it with a backslash. A here document is subject to the same quoting rules as double-quoted text.

There are two special modifications available when using a here document. The first is that a hyphen after the << tells the shell to strip leading tabs (but not leading spaces) from the text:

cat <<−EOF
        Not indented!
EOF

Not indented!

The second is that, if the sentinel is quoted, no substitutions are performed on the text; it is treated as pure literal data, like a string in single quotes. This can be useful if you want to produce text that uses dollar signs. For more details on how quoting works in general, see the discussion of quoting and expansion in Chapter 4.

It is possible to provide multiple here documents in a single command line. They are processed in the order they are specified. For instance, the following script fragment concatenates two here documents:

( cat <&3; cat <&4 ) 4<<EOF 3<<EOF
world!
EOF
Hello,
EOF

Hello,
world!

Note that, because descriptor 4 is redirected first on the command line, the first here document is used as descriptor 4, which is displayed by the second cat command.

However, if there are multiple here documents for the same descriptor (including the default descriptor 0) of the same command, only the last document's contents are presented.

( cat; cat ) <<EOF <<EOF
world!
EOF
Hello,
EOF

Hello,

In this case, the second redirection replaces the first, so only the second document is available on standard input. As always, redirecting a file closes the old one.

Redirection and Loops

Loops, such as a while loop, are themselves shell commands, and any shell command can have its input and output redirected. For instance, on an embedded system that lacks the grep binary, you can always cheat. (This script may also be faster in some cases than using an external command.) The following script is similar to a simple case of grep. Invoked as shellgrep pattern files, it shows all lines from files matching pattern, although it matches shell patterns, not regular expressions:

pattern="$1"
shift 1
cat "$@" | while read line ; do
  case $line in
    *$pattern*) printf "%s " "$line";;
  esac
done

(The "$@" construct is explained in more detail in Chapter 4, and you may also need to know about some special cases discussed in Chapter 7.) If you want to know which file each line came from, you have to make it a bit more complicated:

pattern="$1"
shift 1
for file
do
  while read line; do
    case $line in
      *$pattern*) printf "%s: %s " "$file" "$line";;
    esac
  done < $file
done

This script checks each file separately; if it finds a matching line, it echoes the name of the file before the line. If all you want is the names of matching files, you can do that, too:

pattern="$1"
shift 1
for file
do
  while read line; do
    case $line in
      *$pattern*) printf "%s " "$file"; break;;
    esac
  done < $file
done

This version jumps ahead immediately upon finding a matching line; the break in the inner loop prevents the script from repeating the names of files with multiple matches. Note the similarity to the previous examples using the external grep program to look at each file, or to the -l flag provided by many versions of grep. Note that tricks like this are not only useful on tiny little embedded systems. Because commands like grep are external to the shell, and the case command structure is built in, performance may be better using an idiom like this. On systems with particularly expensive command spawning, such as Windows, the performance difference may be quite surprising.

One limitation of for loops in the shell is that they always perform field splitting; if you want lines split instead of words, you can use a redirected while loop. For instance, consider this example from earlier in the chapter:

for name in "Occupant" "Our Friends" "Current Resident" "Postal Customer"; do
  echo "Hello, $name"
  echo "Look! A personalized letter! Buy our stuff!"
done

If you have a file containing the names of your close personal friends, you might try to use command substitution to adapt this:

$ cat friendslist
Occupant
Our Friends
Current Resident
Postal Customer
$ for name in $(cat friendslist); do
>   echo "Hello, $name!"
> done
Hello, Occupant!
Hello, Our!
Hello, Friends!
Hello, Current!
Hello, Resident!
Hello, Postal!
Hello, Customer!

Well, that didn't go as planned, and now you know where that really weird junk mail comes from. What you need is a way to distinguish between word breaks and line breaks. A simple way to do this is to use a while loop, with its input redirected from the friends list. The read command exits successfully every time it reads a line and fails when it has no input:

$ while read name; do
>   echo "Hello, $name!"
> done < friendslist
Hello, Occupant!
Hello, Our Friends!
Hello, Current Resident!
Hello, Postal Customer!

This idiom is extremely useful and is often used in conjunction with programs such as find, which generate lists of file names. Just be careful; UNIX allows newlines in file names, which can produce surprising results. If you do not have control over the inputs to a loop like this, be very careful about relying on the inputs and sanitize them carefully.

What's Next?

Chapter 4 explains the core methods by which the shell interprets its input: parsing, quoting, and substitution. I introduce tokens and explain how the shell determines what parts of a shell script are commands, what parts are control structures, and what parts are arguments to commands. I then explain the basics of quoting, the mechanism by which you control how the shell interprets words and when it performs substitutions. Finally, I'll go over the basic ways in which the shell substitutes new text, such as replacing variable names with the values of those variables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.239.182