CHAPTER 4: Core Shell Features Explained

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 4
Core Shell Features Explained

This chapter gives a more detailed explanation of the structure of shell programs and the interactions between some of the basic features introduced in Chapter 1. This chapter also introduces the basic grammatical structure of shell programs, then explores the interactions of the quoting, substitution, and globbing mechanisms.

There are a number of exceptions and special cases, which are explained throughout the chapter, but an overview makes it easier to follow what happens. The first thing the shell does is split input into words and special punctuation items, called tokens. After this, substitutions and expansions are performed, replacing variable references with the contents of variables, shell glob characters with file names, and so on. The order of operations is as follows:

Tokenizing. The shell splits inputs into tokens. Keywords and special shell syntax characters are identified at this point, before any substitutions or expansions have occurred.
Parameter and command substitution. Parameter and command substitutions are performed. Quoting may cause some strings that look like parameter or command substitutions to be ignored. (Command substitution is explained in Chapter 5.)
The results of substitution are subject to field splitting.
Globbing is performed on any words that have unquoted glob characters.
Commands and control structures are executed.

There are some complications (for instance, some shells might perform tilde expansion prior to parameter substitution), but this basic order of operations covers what the shell really does. Most of the time, confusion about what a script will do can be resolved by thinking through these steps. Why doesn't this script work?

IF=if

$IF true; then echo hello; fi

It doesn't work because tokenizing happens before parameter substitution. The shell identifies $IF as a word, not a keyword. When it is later replaced with text, it is too late for it to try to become a keyword.

Similarly, the expansion of a glob pattern into file names occurs after parameter expansion. Thus, even if there were a file named $PATH, echo * would not produce the same output as echo $PATH.

The case statement provides exceptions to rules about what happens after substitution; there is neither field splitting nor globbing after substitution in the control string or the patterns of a case statement. In fact, in the patterns, quoting suppresses pattern matching rather than preventing globbing.

Parsing

When reading input, the shell begins by breaking input into a collection of symbols, called tokens. For instance, in a simple shell command such as echo hello, world!, there are four tokens. The first three are the command name and its arguments, and the fourth is a new line (see Table 4-1).

Table 4-1. What the Shell Sees

Token	Description
`echo`	Word
`hello`,	Word
`world!`	Word
`<newline>`	Command separator

The spaces separating the arguments are not tokens; they just separate tokens. The meanings of tokens, and even which tokens a given string contains, are sometimes affected by context; something might have special meaning on one line of a shell script and be an ordinary word on another.

Tokens

There are several different kinds of tokens. The most common are plain words, such as command names and arguments. Some words that have special meaning to the shell, such as if or for, may be special tokens called keywords. Finally, special shell punctuation, such as redirection operators or semicolons used to separate commands, are also tokens.

The special characters are as follows:

|   &   ;   <   >         (       )           $

`      "   '   <space>   <tab>   <newline>   *

?   [   #   ˜   =         %

Not all of these characters are always special; some may be special only in specific contexts. (In some traditional shells, ^ is also special and a synonym for |.)

Anything that is quoted, or which results from substitution, is always a plain word even if it looks like something else. For instance, a new line is normally a token that can end a command. However, a new line in quotes is no longer a special token. Instead, it is just another character that is part of a normal shell word. In this example, there are three tokens:

echo "hello,

world"

The first token is echo. The second is the quoted string hello,<newline>world, with a new line between the comma and the w. The third is the new line after the quoted string. Because it is outside a quoted string, that new line is a token. Similarly, any quoted characters at all in a word ensure that it is treated as a plain word, never as a shell keyword. The text if is simply a plain word if, not the beginning of a control structure.

When forming tokens, the shell sometimes discards things; for instance, unquoted whitespace (such as spaces or tabs) separates tokens, but does not itself become a token. The process of splitting input into words around space is called word splitting. If the shell encounters a sharp (#, also called pound, hash, or octothorpe) while looking for tokens, it reads from that character to the end of the current line and discards the results as a comment. As a matter of style, many programmers prefer to only start comments at the beginning of a line, but it is often easier to read a script with short comments after individual lines.

The underlying principle of the shell's token parsing, common to shell and to many other languages, is that a token is always the longest possible series of characters. This is often called the maximal munch rule. While a # may start a comment, it can also be part of a word. Here's an example of how this works:

echo a #b

echo c# d

a

c# d

In the first line, the first argument ends at the space. The # is encountered in a place where it would have to start a new token, so it starts a comment; the #b is discarded. In the second line, the # occurs as part of a word. Since # can be part of a word, it simply is, and it does not start a comment is. Thus, if there is ambiguity about whether a character is part of the current token or starts a new token, it is always part of the current token.

Similarly, these two lines are very different:

ls hello 2>error

ls hello2>error

The first line tries to list the file hello, sending any error messages to the file error. The second line, however, tries to list the file hello2, sending any output to the file error. The 2 can be part of the word, so it is treated that way. This is a quirk of redirection parsing. You do not need space before a redirection if it is of standard input or standard output, but if you are modifying one of the other descriptors, you generally need a space in front of the redirection so the shell doesn't interpret the descriptor number as part of the previous word.

The redirection operators highlight this because the whole redirection operator is a single token. Thus a number followed by a greater-than or less-than sign is a redirection, but a number separated from a greater-than or less-than sign is not. However, that works only if the number is itself looking like a token; if it is the last part of the previous word, it can't start a new token. This also shows why you cannot use a variable to create a new file descriptor:

logfd=3

exec $logfd>/tmp/log.txt

As described previously, this ends up trying to execute the command 3. You cannot expand variables into special tokens, only into plain words.

On the other hand, the target of a redirection can be quoted, can result from substitution or globbing, or even both.

exec 3>"$logfile"

This does exactly what you would expect: it expands the variable $logfile and redirects descriptor 3 to it. The redirection (3>) is a token; the thing redirected to is a separate token, which can be any word.

Words and Keywords

A token such as if or while is called a keyword, and can only be recognized in certain contexts. Tokens with no special meaning to the shell are called words. A word may have the same spelling as a keyword but is not treated specially by the shell. For instance, in the following script fragment, if is just a word, not a keyword:

echo if

The results of substitution, globbing, or quoting are always words. As an example, consider the following script fragment:

X="Y=3"

$X

Y=3: not found

While the sequence Y=3 would normally be a variable assignment, it resulted from substitution, so it became a plain word. The right-hand side of an assignment can be any word and can result from substitution or globbing. However, the variable name and equals sign must be literals. Likewise, a redirection operator must be a literal, but the name of the file to redirect to can be any word, including one resulting from substitution or globbing. (You can get around this; see the "The eval Command" section in Chapter 5.)

Context often determines the meaning of something to the shell. Context determines whether a new line terminates a command or is simply more whitespace. As with some other languages, the shell interprets a new line as ending a command when the command line so far is grammatically valid and otherwise expects additional input. Similarly, the same characters that would be a variable assignment at the beginning of a line are just another word later in a line:

echo A=B

A=B

The shell usually looks for keywords only in particular places, such as the beginning of a line. Otherwise, words are simply accepted as tokens producing a series of plain words with no special significance to the shell. In the standard shell, the keywords are as follows:

    !      {      }      case      do   done

    elif   else   esac   fi        for

    if     in     then   until     while

Command Lists

In the examples so far, simple commands and pipelines have been used as the controlling expressions for if and while statements. In fact, the controlling expressions for these have the same grammar as their bodies and are sequences of commands called lists. A list is a series of commands or pipelines, usually joined by some combination of semicolons (;), new lines, and ampersands (&), and terminated by one of these. In nearly every case, you can replace a new line with a semicolon. The shell does not distinguish between these two forms of the same command:

if test −f "$file"; then

  echo "$file exists."

fi

if test −f "$file"; then echo "$file exists." ; fi

A series of commands entered on the command line are a list, grammatically. The shell determines the end of a list to have occurred when a special keyword or token shows up that ends the list. For instance, the grammar of a simple if-then-fi statement is as follows:

if list

then list

fi

Starting from an if, the shell reads commands until it encounters a then. The set of commands read is a list. The exit status of a list is the exit status of the last pipeline within the list, just as the exit status of a pipeline is the exit status of the last command within that pipeline. The exit status of the various flow control statements is usually zero if no code was executed, or the exit status of the last code executed. The following contrived example illustrates this:

while if true; then false; fi do

  false

done

The if statement used as a conditional for the while loop always executes its body, which consists of a single false command. The overall exit status of the if statement is the exit status of the last statement executed, the false command, so the while loop terminates immediately, and the exit status of the whole chunk of code is zero (indicating success). The false command inside the while loop is never executed.

FLEXIBLE GRAMMAR

You may have been surprised to see no semicolon after the fi ending the if statement. The semicolon after true is needed because the shell has no other way to recognize that then is intended as a keyword rather than an argument to the true command. Similarly, the semicolon after false is necessary. However, after the shell has detected the fi token, it knows that it has finished parsing the if statement; it does not need a special separator or terminator to tell it to start looking for either a keyword or another statement.

While tricks like this can make scripts several characters shorter, you should generally avoid them. Write for clarity first. In general, expand constructs onto multiple lines. The shell will not be any slower, but future readers of your code will find it more comprehensible.

Similarly, the if statement's controlling expression can be any list, not just a single command. This list can contain a series of commands, including other conditional statements. For instance, the following example asks the user how picky it should be before asking another question:

echo "Would you like me to be picky?"

read picky

echo "So, do you have any grapes?"

read answer

if case $picky in

     [Yy]*) test X"$answer" = X"yes";;

      *) case $answer in [Yy]*) true;; *) false;; esac ;;

    esac

then

  echo "You said yes!"

else

  echo "I don't think you said yes."

fi

The condition for the if statement is a pair of nested case statements. If the user's answer to the first question begins with either a capital or lowercase Y, the program will accept only the exact text "yes" as an answer. Otherwise, the program will accept any string starting with a capital or lowercase Y as being close enough to a "yes." In each case, the exit status is simply the status of the last exiting command: either the test command, used to check for the answer, or the true or false commands used to yield a status from the second case statement.

New lines and semicolons are mostly interchangeable as command separators, with the exception that the shell will politely ignore a series of blank lines but will object to a series of semicolons. Each semicolon must follow a command. Regardless, whether you use semicolons or new lines, each command is executed sequentially, and each command completes before the following command starts.

While ampersands are syntactically command separators, their semantics are different. When a command is followed by an ampersand, the command is run asynchronously; the shell continues immediately, while the command continues running at the same time. This is called running the command in the background. While the most common usage of this on the command line is to run a single command in the background, the ampersand is simply a generic command separator; you can also write multiple commands on a line, separated by ampersands. Each command that is followed by an ampersand is run in the background.

Short Circuits

There are two other command separators, which perform logical tests. They are the "and" operator (&&) and the "or" operator (||). The exit status of a pair of commands joined by && is true if both commands had a true exit status, and false otherwise. Similarly, the exit status of a pair of commands joined by || is true if either command had a true exit status, and false otherwise. As in many programming languages, the shell only executes the second command if the exit status of the pair has not already been determined; this is called short-circuiting. This can be used to express the same functions as an if statement, but is shorter; for simple code, it is often idiomatically better to put the operations together like this. For instance, the following idiom emits a logging message if the variable verbose has been set to true:

$verbose && echo >&2 "Processing $i..."

When the first command supplied to one of the short-circuit operators is an imperative, the meaning is reasonably easy to keep in mind. For instance, the following code fragment might be described as "remove the file or emit an error message":

rm $file || echo >&2 "Could not remove $file."

When a command line contains only the previously discussed command separators, such as semicolons, commands are simply treated in order. The logical short-circuit operators, however, are special; commands joined with these operators are treated more like a single command. For instance, the following fragment has an exit status of success:

false && false; true

The second false is not executed, but the semicolon separates the whole && operation from the true command. The way in which commands group more closely around the logical operators than around the other command separators is often described as the logical operators having higher precedence. It is, however, possible to force the shell to group the second two commands together. To do this, you must tell the shell where you want the lists to be formed.

Explicit Lists

You can join a series of commands together into a single list, which can then be joined with other lists using pipes, used as one side of a short-circuit operator, or otherwise treated as a single unit. There are two ways to do this. The first is to put a list of commands inside braces ({}); such a list is often called a compound statement. In this case, the list of commands must be terminated by a statement terminator, such as a semicolon or new line; otherwise, there is no way for the shell to recognize that the terminating brace was not simply a parameter to a command. In fact, some shells (bash and zsh) recognize the trailing brace without an explicit terminator. Do not rely on this, but do not rely on being able to use an unquoted } as an argument part way through a list either.

Grouping makes a group of commands act like a single command. For instance, the previous example can be converted using braces to separate commands:

false && { false; true; }

This now has an exit status of false; the initial false command generates a false return code, so the compound command in braces is not executed.

The other way to group commands is to put the series of commands inside parentheses [()]. Parentheses have an additional effect beyond forcing commands into a single list; they create a new shell process, called a subshell. Subshells are explained in more detail in Chapter 5. In general, commands within a regular list can affect the environment of the shell, but commands within a subshell have no effect on the environment of the rest of the shell program. On many platforms, subshells are substantially more computationally expensive than compound statements. Avoid using them when you don't need to.

DEBUGGING SUBSTITUTION AND QUOTING

Throughout this chapter, you may find yourself unsure about the interactions of different kinds of quoting and substitution (or globbing). The following simple script shows you exactly what arguments it ultimately received:

#!/bin/sh

echo "$# argument(s):"

for arg

do

        echo "'$arg'"

done

Save this script to a file named printargs somewhere in your path, and make sure it is executable (chmod u+x). The special variable $# holds the number of arguments given to the script. To run the script, invoke the shell on the test file with whatever additional arguments you want:

printargs foo bar

2 argument(s):

'foo'

'bar'

A bit of explanation may be in order. The initial echo command uses quotes because ( and ) are special characters to the shell. Note the unusual quoting around $arg. That is a double quote, a single quote, $arg, a single quote, and a double quote. The double quotes ensure that the shell displays the argument exactly as it was passed in, and the single quotes around it make it easier to see whether the argument begins or ends with any spaces or tabs. Because the single quotes occur inside double quotes, they have no special effect; they are just plain characters that are then echoed. If you still have questions, keep reading, the rest of this chapter explains this in more detail.

The new line before do is there for compatibility with a few old shells that did not handle the shorter for arg; do syntax for a for loop without an in clause.

Shell Quoting

Quoting is the process of suppressing the special meaning of a character. Three different kinds of quoting are provided by the shell. Backslashes, often called escapes, suppress the special meaning of a single character and work in almost every context. Single quotes are used for purely literal text, while double quotes allow some of the shell's substitution behaviors.

Experienced UNIX users looking for a prank often start by creating files in a novice's home directory, which are hard to remove. The simplest way, addressed briefly in the introduction, is to put spaces in the name of a file. Each of the quoting mechanisms can overcome this.

Escaping Characters with a Backslash

The backslash is the most complex quoting mechanism because its behavior is almost, but not quite, perfectly consistent. Normally, a backslash followed by any other character is treated by the shell as that other character, deprived of any special meaning; this is called an "escaped" character. A backslash followed by a space is a space character that does not separate words. A backslash followed by a double quote is a double quote character that does not begin a quoted string. A backslash followed by a backslash is just a plain old backslash.

The first major exception is that a backslash at the end of a line does not create an escaped new line character. Instead, the backslash and the new line are both removed. Of course, it would be too simple if this were always true. If the backslash is inside a comment, it is completely ignored, but the new line has its normal effect. This is the backslash equivalent of the 400-year rule for leap years, and it comes up about as often.

The second is that, inside double quotes, backslashes are not mostly suppressed by the shell; they escape only dollar signs, new lines, backticks (grave accents), double quotes, and backslashes. A backslash followed by anything else is just a backslash in this context.

The third exception is that backslashes are in no way special inside single quotes. No matter how many or how few backslashes you put between single quotes, or what comes after them, they are just backslashes.

There is one other major source of confusion: Many programs do special things with backslashes. For instance, consider what happens if you use echo to test the behavior of backslashes in a single-quoted string:

$ echo ''

You would expect this to produce \ as output. In most shells, it will. However, if you try this in zsh, you get only a single . The problem is that, while the shell has not done anything special with the backslash, the built-in echo in zsh does, in fact, use backslashes specially. You can try to outsmart the shell by calling /bin/echo explicitly, but there is an astounding variety of ways in which the echo command can differ from one system to another. Utility portability is discussed in more detail in Chapter 7. In the meantime, be aware that people have been complaining about the complete nonportability of any but the simplest uses of echo for well over 20 years.

SLASH AND BACKSLASH

Many users find it difficult to distinguish between forward slashes and backslashes. On a US keyboard, the forward slash is the one under the question mark; it is the one that is leaning "forward"—that is to say, the top is farther to the right than the bottom. This confusion is amplified by the tendency of Windows users to think of backwards slashes as path separators, while UNIX users tend to use forward slashes. (In fact, under the hood, Windows uses forward slashes, too; the command interpreter translates backslashes into forward slashes.)

My first thought was to say, "Slash is the one that is used in URLs," but I have seen hundreds of advertisements, business cards, and other things that use backslashes. The problem seems to be not only that people are not sure which one they want, but that many people have the words themselves confused, and thus carefully verify the word "backslash" only to actually mean the thing that everyone else calls a forward slash.

To save you trouble, here's the complete list:

Forward slash: /
Backward slash:
In general, unqualified slash means forward slash.

Escaping Characters with Single Quotes

Single quotes are very simple. Absolutely everything from a single quote to the next single quote is literal. New lines, backslashes, dollar signs, it doesn't matter. Everything is literal. This means that there is no way to include a single quote inside a single quoted string. You will occasionally see this idiom:

echo 'Peter'''s favorite language'

Peter's favorite language

The first single quote starts a string, and the second ends it. This is followed by an unquoted backslash, which escapes the next character, which is a single quote. This results in a quoted single quote; because it is quoted, it does not start a new string. The next character after that is another single quote, starting a new single-quoted string that runs to the end of the line. Because the character between the two strings was not an unquoted word separator, the two strings, and the character between them, are joined into a single string.

Escaping Characters with Double Quotes

Double quotes suppress the meaning of many special characters, but parameter substitution (see the "Understanding Parameter Substitution" section later in this chapter) occurs normally within them. Double quotes are probably the most commonly used form of quoting, as they give the useful combination of allowing for parameter substitution while preventing field splitting. Knowing this, you now know what one of the lines in the argument printing script does:

echo "'$arg'"

The double quotes eliminate the special meaning of the single quotes, allowing the contents of the variable $arg to be expanded. However, the double quotes perform an additional function, which is to prevent globbing or field splitting from being performed on the contents of the variable $arg. Thus if the user passed a string with multiple spaces in as an argument, the string echoed back by the shell will preserve those spaces.

Quoting Examples

The interactions of the different quoting mechanisms can be fairly confusing at first. In general, use single quotes for maximal predictability, double quotes for material that needs parameter substitution, and backslashes to suppress the value of a single special character, such as $.

There are many things you may wish to write that cannot be done within a quoted string of any sort or are excessively awkward in one kind of string but easy in another. In many cases, the simplest thing to do is to use a double-quoted string and use backslashes to suppress additional special meanings. However, if you have a string that uses a great number of backslashes and special characters, you may find single quotes preferable. If you find single quotes useful, but you want to interpolate a single variable, the following idiom may prove useful:

'some text'"$VAR"'more text'

This concatenates the value of $VAR with the surrounding text, while protecting that text from all varieties of shell substitution.

Substitution and Expansion

When processing input, the shell replaces parameters with their values. This replacement is called parameter substitution, parameter expansion, or (rarely) variable interpolation. I use the term substitution because the term expansion might be taken as suggesting that the resulting text is always larger. The POSIX spec uses the term expansion. After parameter substitution, the shell expands certain patterns into file names; this is called pathname expansion, or globbing. This section reviews the basics of parameter substitution and globbing. Chapter 5 discusses command substitution, which is similar in many ways to parameter substitution. Some shells offer additional parameter substitution options that are not portable; these are discussed in Chapter 6.

PARAMETERS OR VARIABLES

What is the difference between a variable and a parameter? The answer depends on which book or manual you are reading. The POSIX spec uses the term parameter for the general case; variables are parameters whose names are identifiers (alphanumeric characters and underscores, with the first character being a letter or an underscore). The special parameters which refer to the arguments of a script program or the shell are called positional parameters.

Many users are more familiar with the term parameter being used to mean arguments; these are what the POSIX spec calls the positional parameters. Someone who refers to $* as a variable rather than a parameter will probably call $1 a parameter rather than a positional parameter.

Substitution and Field Splitting

Often, when parameters are substituted, the output is described as being subject to word splitting. In fact, what really happens to them is something different, called field splitting. The original splitting of input into tokens always uses the same rules; words are split around whitespace (spaces, tabs, and new lines). When a substitution is split, however, different rules may be used.

The shell defines a special variable, $IFS, which defines the field splitting rules. If $IFS is not set, the shell behaves as though it contained space, tab, and new line characters (in that order). If $IFS is set to an empty string, fields are not split at all. Finally, if $IFS is set to a string, then the characters in that string are used to split fields, just as whitespace splits words. The following example illustrates the difference:

$ IFS=:

$ a="hello:world"

$ echo hello:world

hello:world

$ echo $a

hello world

When expanding $*, the shell joins the positional parameters with the first character of $IFS; if $IFS is an empty string, the parameters are concatenated.

Setting $IFS allows you to parse more complicated input. You can check the components of $PATH using $IFS and a for loop:

IFS=:

for dir in $PATH; do

  echo $dir

done

A similar idiom, using the set command to reset the positional parameters (discussed in detail in Chapter 6), is as follows:

IFS=:

set -- $PATH

for dir

do

  echo $dir

done

As a side note, you cannot put the assignment to $IFS on the same line as the command. The command is parsed before the assignment takes effect, even though the command is run after the assignment takes effect.

Although the name $IFS is capitalized, $IFS is not usually exported. The behavior of child shells to which $IFS has been exported is not portable. Don't do that.

Understanding Parameter Substitution

Parameter substitution occurs only in double-quoted strings or outside of any quoting and is introduced by a dollar sign. A dollar sign that has been escaped, or that occurs in a single-quoted string, has no special meaning. If the first character after the dollar sign is a punctuation mark that denotes a built-in shell parameter or a digit, it is taken as the name of a built-in shell parameter to substitute. There are a number of built-in parameters, and many shells define additional such parameters. For now, the short list in Table 4-2 of common parameters will suffice.

Table 4-2. Common Shell Parameters

Parameter	Description
`$0`	Name of current program; usually the name of a script file, or just the shell's name.
`$1`	First parameter of current script or function.
`$2`	Second parameter of current script or function. (This pattern continues, but parameters 10 and higher require special treatment.)
`$*`	All parameters of current script or function, separated by spaces.
`$@`	All parameters of current script or function. Outside of quotes, identical to `$*`. Inside double quotes, expands to each parameter inside separate double quotes.
`$$`	The process ID of the shell.
`$#`	The number of positional parameters.

If the first character after the dollar sign is a letter or underscore, the shell takes that character, plus any following letters, numbers, or underscores, to be the name of a variable to expand. This creates an interesting problem: What do you do if you want to append some characters after the substitution of a variable? For instance, the following script might have been intended to produce "hello, world," but it actually produces only an empty line:

$ hello="hello, "

$ echo $helloworld

The output is an empty line because the shell is expanding the unset variable helloworld, not the recently set variable hello followed by the text "world." There are a number of clever or sneaky tricks to get around this, but the best solution is to use braces to delimit the variable:

$ echo ${hello}world

hello, world

When the shell sees a curly brace after the dollar sign, it searches for the next matching brace to determine which parameter to substitute. Braces are also needed to refer to positional parameters ${10} and higher. The shell replaces $10 with a literal "0" appended to the value of $1; this is the reverse of the behavior that mandates the use of parentheses when working with identifiers. Older shells do not recognize ${10}; in these shells, you must use shift to access positional parameters past $9. (See Chapter 6 for more discussion on positional parameters.)

Sometimes, you may be unsure of whether a variable will have been set or not before a given piece of code executes. The shell has a variety of features to allow for alternative substitutions in place of variables that are not set (or set to an empty string, also called a null string). The most commonly used variant is the ${parameter:-word} construct, which is equivalent to ${parameter} if it has a value, or word otherwise. In the case where the construct is substituted with word, that is subject to substitution as well. The following fragment greets the user in an even less-efficient way than usual:

foo=""

bar="world"

echo hello, ${foo-$bar}

hello, world

The substitution rules in Table 4-3 are a common and well-supported subset of those available in standard shells (and even a number of prestandard shells).

Table 4-3. A Subset of Special Parameter Substitutions

Pattern	Description
`${parameter:-word}`	If `parameter` is null or unset, substitute `word`; otherwise, substitute `parameter`.
`${parameter:=word}`	If `parameter` is null or unset, assign `word` to parameter. Then substitute `parameter`.
`${parameter:+word}`	If `parameter` is null or unset, substitute null; otherwise, substitute `word`.
`${parameter:?word}`	If `parameter` is null or unset, print `word` (or a default message if `word` is null) to standard error and exit the shell.

In each of these substitutions, the colon may be omitted; in this case, the shell tests only for a parameter that is unset, not an empty string (also called a null value). With the colon, an empty string is treated the same as an unset parameter. Each of these forms is useful under different circumstances.

The hyphen form of substitution is primarily used to provide a default value, while allowing a user to override it. This is especially likely to be useful with environment variables, allowing the user to override the default behavior of a script. A typical example from a compilation script would be to provide a default value for the CFLAGS environment variable, which is used by convention to hold compiler options:

cc ${CFLAGS-"−O2"} −o hello hello.c

If the CFLAGS environment variable is set, it is passed to the compiler. Otherwise, the value −O2 is passed in as a default. The quotes around the flag are not needed but are allowed; in this case, I used them because it helps visually distinguish between the hyphen in the shell syntax and the intended replacement text. Also, it is useful to get in the habit of providing quotes in cases where they might or might not be necessary because the alternative is usually to omit them when they were necessary. Program defensively.

Of course, in a longer script, it is quite possible to imagine a lack of interest in typing that same construct over and over. One improvement is to use the equals sign substitution rule the first time and thereafter use the variable's value:

cc ${CFLAGS="−O2"} −o hello hello.c

cc $CFLAGS −o goodbye goodbye.c

When the shell expands ${CFLAGS="−O2"}, one of two things happens. If the CFLAGS variable was already set, it expands, and its value is unchanged. If the variable was not set, or was empty, it is replaced by the assigned value (−O2, in this case), and then expanded. Thus, whether or not the variable was set before the first line was executed, it will definitely be set after that line is executed.

This is functional but a little clumsy. It creates an unfortunate ordering dependency on the lines in the script; if you later discover that your new boss lives backward in time and requires that goodbye.c be compiled before hello.c, you cannot simply reverse the lines in the script; you have to edit both of them. (While the particular circumstance may seem unusual, being obliged to reorder operations in a script is quite common.) You have two workable options. One is to switch to a more elaborate construct, possibly using test to check the existing value of the variable before assigning it. You should not simply place the variable substitution on a line by itself; the substitution would then be executed as a command. However, you can use it as an argument to a command that does nothing:

: ${CFLAGS:="−O2"}

cc $CFLAGS −o hello hello.c

cc $CFLAGS −o goodbye goodbye.c

This is a very expressive idiom. In this case, true and : are not equivalent; some implementations of true inexplicably react to some possible combinations of parameters by doing something:

$ /bin/true --version

true (GNU coreutils) 6.10

Copyright (C) 2008 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.



Written by Jim Meyering.

This kind of thing can be fairly disruptive of the output of a script. Stick with : for such usage.

The plus sign substitution rule has an interesting history. One of its most powerful uses is nearly entirely obsolete now, and it involves the special shell parameter $@. In some very early shells, if there were no parameters at all, "$@" substituted a quoted empty string rather than to nothing. (The more convenient behavior is specified by POSIX and is reasonably close to universal in modern shells. For details, see the discussion of shell versions in Chapter 7.) One idiom for working around this is ${1+"$@"}. This expands to "$@" if $1 is set; otherwise, it's set to null. In this case, using the colon would undermine the entire point of the exercise; it would result in an incorrect substitution for the arguments of a script whenever the first argument was an empty string. It is useful in this and other cases where you wish to avoid substituting something unless there is something to substitute.

The +: form is a little harder to find really good uses for, but it has its place, too. As an example, consider appending a series of words together. You want spaces between words, but you do not want extra spaces. You can write an elaborate hunk of code to append spaces suitably, keeping everything quoted, or you can use ${var+:" $var"}. This expands to a space followed by $var, if var has a nonempty value, or to nothing at all, if var was empty or unset.

The question mark substitution rule is of limited utility. In most cases, you will want to write your own, more robust, error handling. On the other hand, if you really do not feel there is any sensible default, you can always use this to force people to pick one:

cc ${CFLAGS?:Cannot compile without compiler flags.} −o hello hello.c

build.sh:1: CFLAGS: :Cannot compile without compiler flags.

The exact format of this error message may vary between shells.

When a parameter substitution occurs outside of double quotes, the results of the substitution are usually subjected to field splitting and globbing, but never to parameter substitution again; if a variable expands to $FOO, it does not get expanded again. Inside double quotes, nothing happens after parameter substitution. (Parameter substitution cannot occur within single quotes, making the question of what would happen if it did moot.) As a rather unusual special case, the word used as the controller for a case statement is subject to tilde expansion, and then parameter substitution, but the results of the parameter substitution are not subject to any further modifications, not even field splitting. The common habit of quoting a single variable used to control a case statement is unnecessary, although some people prefer it as a matter of style.

Tilde Expansion

Tilde expansion is a special expansion that replaces certain strings starting with tildes (˜) with the home directories of named users, or the current user if no user is named. An unquoted tilde at the beginning of a word may be subject to tilde expansion. If a user name is provided (consisting of everything from the tilde to the first unquoted slash, or simply the whole word), that user's home directory replaces the tilde and user name. If no user name is provided, the tilde is replaced by the current user's home directory. For instance, ˜bob is replaced with the home directory of the user bob. If there is more text, it is appended to the results of the expansion. For instance, ˜/bin refers to the bin subdirectory of $HOME. Tilde expansion does not check its results against the file system; it expands only based on user account information or the $HOME environment variable. The behavior if a nonexistent user is named is nonportable, although many shells simply omit any substitution. Tilde expansion can occur after colons in a variable assignment. For instance, the shell expands tildes in the following:

PATH=/bin:/usr/bin:˜bob/bin:˜amy/bin

Standard shells expand both ˜bob and ˜amy in the preceding example (assuming both users exist). Tilde expansion is universal among POSIX shells, but some older shells do not provide it.

Globbing

The basic globbing rules were described in Chapter 2, along with shell patterns (which they somewhat resemble). Although multiple matching path names expand into multiple words, the individual file names are not subject to field splitting.

Globbing never occurs within quotes, because glob characters have no special meaning within quotes. Glob characters next to quoted text are expanded with the quoted text as part of the pattern. Quoting is often useful when you wish to match a path that includes a variable substitution. For instance, the following shell command has a hidden bug:

rm −rf build/$version/*.log

As long as $version is something simple, like 4.2 or 3.1415, this command behaves as expected. However, imagine your chagrin should you ever attempt this on a version with spaces in it, such as 1.2 / prerelease. The result would be the following:

rm −rf build/1.2 / prerelease/*.log

This may be one of the few cases where one might, for a brief moment, wish for the csh feature of responding "No match" when a glob fails. The shell simply performs no globbing, leaving you with a command that, if you are very smart and were not running as root, probably eventually tries to remove prerelease/*.log and fails. Worse yet, the −f flag means you do not even get a warning message. You might try to resolve this by quoting as follows:

rm −rf "build/$version/*.log"

However, glob characters have no effect inside quotes, so rm simply tries to find a file with the literal name build/1.2 / prerelease/*.log, and it probably fails. The solution is to combine quoted and unquoted text:

rm −rf build/"$version"/*.log

This causes the shell to try to find every file in build/1.2 / prerelease with a name matching the pattern *.log, and then pass their names as arguments to rm. This still may not do what you want, as it denotes a directory named " prerelease" inside a directory named "1.2 ," but at least it won't turn into a 16-hour night with the backups. You did make backups, right?

UNUSUAL FILE NAMES

The greatest weaknesses of the shell are two simple characters: space and new line. The classic UNIX file system allows all but two characters in file names; one is the slash, used as a directory separator, and the other is the ASCII NUL byte (with the integer value 0, which is not the same as a literal 0 digit). Unfortunately, many shell programs and scripts do not cope gracefully with file names containing spaces. Many, many more can do horrible things given a file name containing a new line.

You can mostly work around the space character with experience and practice. For the new line, there is often nothing you can do. The utility features needed to let you work reasonably safely with names containing new lines are not portable enough.

Spaces, while they can be dealt with given sufficient care, are simply too hard to get right for it to be safe to assume that arbitrary script programs will deal with them gracefully. Do not use spaces in file names.

With the widespread adoption of Mac OS X, many more UNIX developers are becoming familiar with environments in which spaces in file names are more common. Still, don't take chances when you don't have to.

If a glob pattern is assigned to a variable, nothing special happens; the text of the pattern is stored in the variable. However, when the variable is substituted, it will generally be subject to globbing.

What's Next?

Now that you understand the quoting and substitution rules, you can write a broad variety of very powerful shell scripts. However, there are a few things that can't be done without more powerful tools. The next chapter introduces ways to organize and reuse code, as well as how to run pieces of code as if they were separate scripts, giving you a lot of additional flexibility.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 4: Core Shell Features Explained

Create new playlist

Sign In

Sign Up

CHAPTER 4Core Shell Features Explained