Chapter 7. Input/Output and Command-Line Processing

The past few chapters have gone into detail about various shell programming techniques, mostly focused on the flow of data and control through shell programs. In this chapter, we switch the focus to two related topics. The first is the shell’s mechanisms for doing file-oriented input and output. We present information that expands on what you already know about the shell’s basic I/O redirectors.

Second, we’ll “zoom in” and talk about I/O at the line and word level. This is a fundamentally different topic, since it involves moving information between the domains of files/terminals and shell variables. echo and command substitution are two ways of doing this that we’ve seen so far.

Our discussion of line and word I/O will lead into a more detailed explanation of how the shell processes command lines. This information is necessary so that you can understand exactly how the shell deals with quotation, and so that you can appreciate the power of an advanced command called eval, which we will cover at the end of the chapter.

I/O Redirectors

In Chapter 1, you learned about the shell’s basic I/O redirectors: >, <, and |. Although these are enough to get you through 95% of your UNIX life, you should know that bash supports many other redirectors. Table 7-1 lists them, including the three we’ve already seen. Although some of the rest are broadly useful, others are mainly for systems programmers.

Table 7-1. I/O redirectors

Redirector

Function

cmd1 | cmd2

Pipe; take standard output of cmd1 as standard input to cmd2.

> file

Direct standard output to file.

< file

Take standard input from file.

>> file

Direct standard output to file; append to file if it already exists.

>| file

Force standard output to file even if noclobber is set.

n>| file

Force output to file from file descriptor n even if noclobber is set.

<> file

Use file as both standard input and standard output.

n<> file

Use file as both input and output for file descriptor n.

<< label

Here-document; see text.

n > file

Direct file descriptor n to file.

n < file

Take file descriptor n from file.

n >> file

Direct file descriptor n to file; append to file if it already exists.

n>&

Duplicate standard output to file descriptor n.

n<&

Duplicate standard input from file descriptor n.

n>&m

File descriptor n is made to be a copy of the output file descriptor.

n<&m

File descriptor n is made to be a copy of the input file descriptor.

&>file

Directs standard output and standard error to file.

<&-

Close the standard input.

>&-

Close the standard output.

n>&-

Close the output from file descriptor n.

n<&-

Close the input from file descriptor n.

n>&word

If n is not specified, the standard output (file descriptor 1) is used. If the digits in word do not specify a file descriptor open for output, a redirection error occurs. As a special case, if n is omitted, and word does not expand to one or more digits, the standard output and standard error are redirected as described previously.

n<&word

If word expands to one or more digits, the file descriptor denoted by n is made to be a copy of that file descriptor. If the digits in word do not specify a file descriptor open for input, a redirection error occurs. If word evaluates to -, file descriptor n is closed. If n is not specified, the standard input (file descriptor 0) is used.

n>&digit-

Moves the file descriptor digit to file descriptor n, or the standard output (file descriptor 1) if n is not specified.

n<&digit-

Moves the file descriptor digit to file descriptor n, or the standard input (file descriptor 0) if n is not specified. digit is closed after being duplicated to n.

Notice that some of the redirectors in Table 7-1 contain a digit n, and that their descriptions contain the term file descriptor; we’ll cover that in a little while.

The first two new redirectors, >> and >|, are simple variations on the standard output redirector >. The >> appends to the output file (instead of overwriting it) if it already exists; otherwise it acts exactly like >. A common use of >> is for adding a line to an initialization file (such as .bashrc or .mailrc) when you don’t want to bother with a text editor. For example:

$ cat >> .bashrc
  alias cdmnt='mount -t iso9660 /dev/sbpcd /cdrom'
  ^D

As we saw in Chapter 1, cat without an argument uses standard input as its input. This allows you to type the input and end it with CTRL-D on its own line. The alias line will be appended to the file .bashrc if it already exists; if it doesn’t, the file is created with that one line.

Recall from Chapter 3, that you can prevent the shell from overwriting a file with > file by typing set -o noclobber. >| overrides noclobber—it’s the “Do it anyway, dammit!” redirector.

The redirector <> is mainly meant for use with device files (in the /dev directory), i.e., files that correspond to hardware devices such as terminals and communication lines. Low-level systems programmers can use it to test device drivers; otherwise, it’s not very useful.

The rest of the redirectors will only be useful in special situations and you are unlikely to need them most of the time.

Here-documents

The << label redirector essentially forces the input to a command to be the shell’s standard input, which is read until there is a line that contains only label. The input in between is called a here-document. Here-documents aren’t very interesting when used from the command prompt. In fact, it’s the same as the normal use of standard input except for the label. We could use a here-document to simulate the mail facility. When you send a message to someone with the mail utility, you end the message with a dot (.). The body of the message is saved in a file, msgfile:

$ cat >> msgfile << .
  > this is the text of
  > our message.
  > .

Here-documents are meant to be used from within shell scripts; they let you specify “batch” input to programs. A common use of here-documents is with simple text editors like ed. Task 7-1 is a programming task that uses a here-document in this way.

We can use ed to delete the header lines. To do this, we need to know something about the syntax of mail messages; specifically, that there is always a blank line between the header lines and the message text. The ed command 1,/^[]*$/d does the trick: it means, “Delete from line 1 until the first blank line.” We also need the ed commands w (write the changed file) and q (quit). Here is the code that solves the task:

ed $1 << EOF
1,/^[ ]*$/d
w
q
EOF

The shell does parameter (variable) substitution and command substitution on text in a here-document, meaning that you can use shell variables and commands to customize the text. A good example of this is the bashbug script, which sends a bug report to the bash maintainer (see Chapter 11). Here is a stripped-down version:

MACHINE="i586"
OS="linux-gnu"
CC="gcc"
CFLAGS=" -DPROGRAM='bash' -DHOSTTYPE='i586' -DOSTYPE='linux-gnu' 
    -DMACHTYPE='i586-pc-linux-gnu' -DSHELL -DHAVE_CONFIG_H   -I. 
    -I. -I./lib -g -O2"
RELEASE="2.01"
PATCHLEVEL="0"
RELSTATUS="release"
MACHTYPE="i586-pc-linux-gnu"
     
TEMP=/tmp/bbug.$$
     
case "$RELSTATUS" in
alpha*|beta*)   [email protected] ;;
*)              [email protected] ;;
esac
     
BUGADDR="${1-$BUGBASH}"
     
UN=
if (uname) >/dev/null 2>&1; then
        UN=`uname -a`
fi
     
cat > $TEMP <<EOF
From: ${USER}
To: ${BUGADDR}
Subject: [50 character or so descriptive subject here (for reference)]
     
Configuration Information [Automatically generated, do not change]:
Machine: $MACHINE
OS: $OS
Compiler: $CC
Compilation CFLAGS: $CFLAGS
uname output: $UN
Machine Type: $MACHTYPE
     
bash Version: $RELEASE
Patch Level: $PATCHLEVEL
Release Status: $RELSTATUS
     
Description:
        [Detailed description of the problem, suggestion, or complaint.]
     
Repeat-By:
        [Describe the sequence of events that causes the problem
        to occur.]
     
Fix:
        [Description of how to fix the problem.  If you don't know a
        fix for the problem, don't include this section.]
EOF
     
vi $TEMP
     
mail $BUGADDR < $TEMP

The first eight lines are generated when bashbug is installed. The shell will then substitute the appropriate values for the variables in the text whenever the script is run.

The redirector << has two variations. First, you can prevent the shell from doing parameter and command substitution by surrounding the label in single or double quotes. In the above example, if you used the line cat > $TEMP <<`EOF', then text like $USER and $MACHINE would remain untouched (defeating the purpose of this particular script).

The second variation is <<-, which deletes leading TABs (but not blanks) from the here-document and the label line. This allows you to indent the here-document’s text, making the shell script more readable:

cat > $TEMP <<-EOF
        From: ${USER}
        To: ${BUGADDR}
        Subject: [50 character or so descriptive subject here]
     
        Configuration Information [Automatically generated,
            do not change]:
        Machine: $MACHINE
        OS: $OS
        Compiler: $CC
        Compilation CFLAGS: $CFLAGS
        ...
EOF

Make sure you are careful when choosing your label so that it doesn’t appear as an actual input line.

A slight variation on this is provided by the here string. It takes the form <<<word; the word is expanded and supplied on the standard input.

File Descriptors

The next few redirectors in Table 7-1 depend on the notion of a file descriptor. Like the device files used with <>, this is a low-level UNIX I/O concept that is of interest only to systems programmers—and then only occasionally. You can get by with a few basic facts about them; for the whole story, look at the entries for read( ), write( ), fcntl( ), and others in Section 2 of the UNIX manual. You might wish to refer to UNIX Power Tools by Shelley Powers, Jerry Peek, Tim O’Reilly, and Mike Loukides (O’Reilly).

File descriptors are integers starting at 0 that refer to particular streams of data associated with a process. When a process starts, it usually has three file descriptors open. These correspond to the three standards: standard input (file descriptor 0), standard output (1), and standard error (2). If a process opens additional files for input or output, they are assigned to the next available file descriptors, starting with 3.

By far the most common use of file descriptors with bash is in saving standard error in a file. For example, if you want to save the error messages from a long job in a file so that they don’t scroll off the screen, append 2> file to your command. If you also want to save standard output, append > file1 2> file2.

This leads to another programming task.

We’ll call this script start. The code is very terse:

"$@" > logfile 2>&1 &

This line executes whatever command and parameters follow start. (The command cannot contain pipes or output redirectors.) It sends the command’s standard output to logfile.

Then, the redirector 2>&1 says, “send standard error (file descriptor 2) to the same place as standard output (file descriptor 1).” Since standard output is redirected to logfile, standard error will go there too. The final & puts the job in the background so that you get your shell prompt back.

As a small variation on this theme, we can send both standard output and standard error into a pipe instead of a file: command 2>&1 | ... does this. (Make sure you understand why.) Here is a script that sends both standard output and standard error to the logfile (as above) and to the terminal:

"$@" 2>&1 | tee logfile &

The command tee takes its standard input and copies it to standard output and the file given as argument.

These scripts have one shortcoming: you must remain logged in until the job completes. Although you can always type jobs (see Chapter 1) to check on progress, you can’t leave your terminal until the job finishes, unless you want to risk a breach of security.[1] We’ll see how to solve this problem in the next chapter.

The other file-descriptor-oriented redirectors (e.g., <& n) are usually used for reading input from (or writing output to) more than one file at the same time. We’ll see an example later in this chapter. Otherwise, they’re mainly meant for systems programmers, as are <&- (force standard input to close) and >&- (force standard output to close).

Before we leave this topic, we should just note that 1> is the same as >, and 0< is the same as <. If you understand this, then you probably know all you need to know about file descriptors.

String I/O

Now we’ll zoom back in to the string I/O level and examine the echo and read statements, which give the shell I/O capabilities that are more analogous to those of conventional programming languages.

echo

As we’ve seen countless times in this book, echo simply prints its arguments to standard output. Now we’ll explore the command in greater detail.

Options to echo

echo accepts a few dash options, listed in Table 7-2.

Table 7-2. echo options

Option

Function

-e

Turns on the interpretation of backslash-escaped characters

-E

Turns off the interpretation of backslash-escaped characters on systems where this mode is the default

-n

Omits the final newline (same as the c escape sequence)

echo escape sequences

echo accepts a number of escape sequences that start with a backslash.[2] They are listed in Table 7-3.

These sequences exhibit fairly predictable behavior, except for f: on some displays, it causes a screen clear, while on others it causes a line feed. It ejects the page on most printers. v is somewhat obsolete; it usually causes a line feed.

Table 7-3. echo escape sequences

Sequence

Character printed

a

ALERT or CTRL-G (bell)



BACKSPACE or CTRL-H

c

Omit final NEWLINE

e

Escape character (same as E)

E

Escape character[3]

f

FORMFEED or CTRL-L

NEWLINE (not at end of command) or CTRL-J

RETURN (ENTER) or CTRL-M

TAB or CTRL-I

v

VERTICAL TAB or CTRL-K

n

ASCII character with octal (base-8) value n, where n is 1 to 3 digits

nnn

The eight-bit character whose value is the octal (base-8) value nnn where nnn is 1 to 3 digits

xHH

The eight-bit character whose value is the hexadecimal (base-16) value HH (one or two digits)

\

Single backslash

[3] Not available in versions of bash prior to 2.0.

The , , and x sequences are even more device-dependent and can be used for complex I/O, such as cursor control and special graphics characters.

printf

bash ’s echo command is quite powerful and for most cases entirely adequate. However, there are occasions where a more powerful and flexible approach is needed for printing information, especially when the information needs to be formatted. bash provides this by giving access to a powerful system-level printing library known as printf.[4]

The printf command can output a string similar to the echo command:

printf "hello world"

Unlike the echo command, printf does not automatically provide a newline. If we want to make it do the exactly same as a standard echo then we must provide one by adding to the end:

printf "hello world
"

You may ask why this is any better than echo. The printf command has two parts, which is what makes it so powerful.

printf format-string [arguments]

The first part is a string that describes the format specifications; this is best supplied as a string constant in quotes. The second part is an argument list, such as a list of strings or variable values that correspond to the format specifications. (The format is reused as necessary to use up all of the arguments. If the format requires more arguments than are supplied, the extra format specifications behave as if a zero value or null string, as appropriate, had been supplied). A format specification is preceded by a percent sign (%), and the specifier is one of the characters described below. Two of the main format specifiers are %s for strings and %d for decimal integers.

This sounds complicated but we can begin by re-casting the last example:

printf "%s %s
" hello world

This prints hello world on a line of its own, just as the previous example did. The word hello has been assigned to the first format specification, %s. Likewise, world has been assigned to the second %s. printf then prints these two strings followed by the newline.

We could also achieve the same result by making hello an explicit part of the format string:

$ printf "hello %s
" world
hello world

The allowed specifiers are shown in Table 7-4.

Table 7-4. printf format specifiers

Specifier

Description

%c

ASCII character (prints first character of corresponding argument)

%d

Decimal integer

%i

Same as %d

%e

Floating-point format ([-]d.precisione[+-]dd) (see following text for meaning of precision)

%E

Floating-point format ([-]d.precisionE[+-]dd)

%f

Floating-point format ([-]ddd.precision)

%g

%e or %f conversion, whichever is shorter, with trailing zeros removed

%G

%E or %f conversion, whichever is shortest, with trailing zeros removed

%o

Unsigned octal value

%s

String

%u

Unsigned decimal value

%x

Unsigned hexadecimal number; uses a-f for 10 to 15

%X

Unsigned hexadecimal number; uses A-F for 10 to 15

%%

Literal %

The printf command can be used to specify the width and alignment of output fields. A format expression can take three optional modifiers following % and preceding the format specifier:

               %flags width.precision format-specifier

The width of the output field is a numeric value. When you specify a field width, the contents of the field are right-justified by default. You must specify a flag of “-” to get left-justification. (The rest of the flags are discussed shortly.) Thus, “%-20s” outputs a left-justified string in a field 20 characters wide. If the string is less than 20 characters, the field is padded with whitespace to fill. In the following examples, a | is output to indicate the actual width of the field. The first example right-justifies the text:

printf "|%10s|
" hello

It produces:

|     hello|

The next example left-justifies the text:

printf "|%-10s|
" hello

It produces:

|hello     |

The precision modifier, used for decimal or floating-point values, controls the number of digits that appear in the result. For string values, it controls the maximum number of characters from the string that will be printed.

You can specify both the width and precision dynamically, via values in the printf argument list. You do this by specifying asterisks, instead of literal values.

$ myvar=42.123456
$ printf "|%*.*G|
" 5 6 $myvar
|42.1235|

In this example, the width is 5, the precision is 6, and the value to print comes from the value of myvar.

The precision is optional. Its exact meaning varies by control letter, as shown in Table 7-5.

Table 7-5. Meaning of precision

Conversion

Precision means

%d, %I, %o, %u, %x, %X

The minimum number of digits to print. When the value has fewer digits, it is padded with leading zeros. The default precision is 1.

%e, %E

The minimum number of digits to print. When the value has fewer digits, it is padded with zeros after the decimal point. The default precision is 10. A precision of 0 inhibits printing of the decimal point.

%f

The number of digits to the right of the decimal point.

%g, %G

The maximum number of significant digits.

%s

The maximum number of characters to print.

Finally, one or more flags may precede the field width and the precision. We’ve already seen the “-” flag for left-justification. The rest of the flags are shown in Table 7-6.

Table 7-6. Flags for printf

Character

Description

-

Left-justify the formatted value within the field.

space

Prefix positive values with a space and negative values with a minus.

+

Always prefix numeric values with a sign, even if the value is positive.

#

Use an alternate form: %o has a preceding 0; %x and %X are prefixed with 0x and 0X, respectively; %e, %E and %f always have a decimal point in the result; and %g and %G do not have trailing zeros removed.

0

Pad output with zeros, not spaces. This only happens when the field width is wider than the converted result. In the C language, this flag applies to all output formats, even non-numeric ones. For bash, it only applies to the numeric formats.

If printf cannot perform a format conversion, it returns a non-zero exit status.

Additional bash printf specifiers

Besides the standard specifiers just described, the bash shell (and other POSIX compliant shells) accepts two additional specifiers. These provide useful features at the expense of nonportability to versions of the printf command found in some other shells and in other places in UNIX:

%b

When used instead of %s, expands echo-style escape sequences in the argument string. For example:

$ printf "%s
" 'hello
world'
hello
world
$ printf "%b
" 'hello
world'
hello
world
%q

When used instead of %s, prints the string argument in such a way that it can be used for shell input. For example:

$ printf "%q
" "greetings to the world"
greetings to the world

read

The other half of the shell’s string I/O facilities is the read command, which allows you to read values into shell variables. The basic syntax is:

read var1 var2...

This statement takes a line from the standard input and breaks it down into words delimited by any of the characters in the value of the environment variable IFS (see Chapter 4; these are usually a space, a TAB, and NEWLINE). The words are assigned to variables var1, var2, etc. For example:

$ read character1 character2alice duchess$ echo $character1alice
$ echo $character2duchess

If there are more words than variables, then excess words are assigned to the last variable. If you omit the variables altogether, the entire line of input is assigned to the variable REPLY.

You may have identified this as the “missing ingredient” in the shell programming capabilities we have seen thus far. It resembles input statements in conventional languages, like its namesake in Pascal. So why did we wait this long to introduce it?

Actually, read is sort of an “escape hatch” from traditional shell programming philosophy, which dictates that the most important unit of data to process is a text file, and that UNIX utilities such as cut, grep, sort, etc., should be used as building blocks for writing programs.

read, on the other hand, implies line-by-line processing. You could use it to write a shell script that does what a pipeline of utilities would normally do, but such a script would inevitably look like:

while (read a line) do
    process the line
    print the processed line
end

This type of script is usually much slower than a pipeline; furthermore, it has the same form as a program someone might write in C (or some similar language) that does the same thing much faster. In other words, if you are going to write it in this line-by-line way, there is little point in writing a shell script.

Reading lines from files

Nevertheless, shell scripts with read are useful for certain kinds of tasks. One is when you are reading data from a file small enough so that efficiency isn’t a concern (say a few hundred lines or less), and it’s really necessary to get bits of input into shell variables.

Consider the case of a UNIX machine that has terminals that are hardwired to the terminal lines of the machine. It would be nice if the TERM environment variable was set to the correct terminal type when a user logged in.

One way to do this would be to have some code that sets the terminal information when a user logs in. This code would presumably reside in /etc/profile, the system-wide initialization file that bash runs before running a user’s .bash_profile. If the terminals on the system change over time—as surely they must—then the code would have to be changed. It would be better to store the information in a file and change just the file instead.

Assume we put the information in a file whose format is typical of such UNIX “system configuration” files: each line contains a device name, a TAB, and a TERM value.

We’ll call the file /etc/terms, and it would typically look something like this:

console        console
tty01        wy60
tty03        vt100
tty04        vt100
tty07        wy85
tty08        vt100

The values on the left are terminal lines and those on the right are the terminal types that TERM can be set to. The terminals connected to this system are a Wyse 60 (wy60), three VT100s (vt100), and a Wyse 85 (wy85). The machines’ master terminal is the console, which has a TERM value of console.

We can use read to get the data from this file, but first we need to know how to test for the end-of-file condition. Simple: read’s exit status is 1 (i.e., non-zero) when there is nothing to read. This leads to a clean while loop:

TERM=vt100       # assume this as a default
line=$(tty)
while read dev termtype; do
    if [ $dev = $line ]; then
        TERM=$termtype
        echo "TERM set to $TERM."
        break
    fi
done

The while loop reads each line of the input into the variables dev and termtype. In each pass through the loop, the if looks for a match between $dev and the user’s tty ($line, obtained by command substitution from the tty command). If a match is found, TERM is set, a message is printed, and the loop exits; otherwise TERM remains at the default setting of vt100.

We are not quite done, though: this code reads from the standard input, not from /etc/terms! We need to know how to redirect input to multiple commands. It turns out that there are a few ways of doing this.

I/O redirection and multiple commands

One way to solve the problem is with a subshell, as we’ll see in the next chapter. This involves creating a separate process to do the reading. However, it is usually more efficient to do it in the same process; bash gives us four ways of doing this.

The first, which we have seen already, is with a function:

findterm ( ) {
    TERM=vt100       # assume this as a default
    line=$(tty)
    while read dev termtype; do
        if [ $dev = $line ]; then
            TERM=$termtype
            echo "TERM set to $TERM."
            break;
        fi
    done
}
     
findterm < /etc/terms

A function acts like a script in that it has its own set of standard I/O descriptors, which can be redirected in the line of code that calls the function. In other words, you can think of this code as if findterm were a script and you typed findterm < /etc/terms on the command line. The read statement takes input from /etc/terms a line at a time, and the function runs correctly.

The second way is to simplify this slightly by placing the redirection at the end of the function:

findterm ( ) {
    TERM=vt100       # assume this as a default
    line=$(tty)
    while read dev termtype; do
        if [ $dev = $line ]; then
            TERM=$termtype
            echo "TERM set to $TERM."
            break;
        fi
    done
} < /etc/terms

Whenever findterm is called, it takes its input from /etc/terms.

The third way is by putting the I/O redirector at the end of the loop, like this:

TERM=vt100       # assume this as a default
line=$(tty)
while read dev termtype; do
    if [ $dev = $line ]; then
        TERM=$termtype
        echo "TERM set to $TERM."
        break;
    fi
done < /etc/terms

You can use this technique with any flow-control construct, including if...fi, case...esac, select...done, and until...done. This makes sense because these are all compound statements that the shell treats as single commands for these purposes. This technique works fine—the read command reads a line at a time—as long as all of the input is done within the compound statement.

Command blocks

But if you want to redirect I/O to or from an arbitrary group of commands without creating a separate process, you need to use a construct that we haven’t seen yet. If you surround some code with { and }, the code will behave like a function that has no name. This is another type of compound statement. In accordance with the equivalent concept in the C language, we’ll call this a command block.

What good is a block? In this case, it means that the code within the curly brackets ({}) will take standard I/O descriptors just as we described in the last block of code. This construct is appropriate for the current example because the code needs to be called only once, and the entire script is not really large enough to merit breaking down into functions. Here is how we use a block in the example:

{
    TERM=vt100       # assume this as a default
    line=$(tty)
    while read dev termtype; do
        if [ $dev = $line ]; then
            TERM=$termtype
            echo "TERM set to $TERM."
            break;
        fi
    done
} < /etc/terms

To help you understand how this works, think of the curly brackets and the code inside them as if they were one command, i.e.:

{ TERM=vt100; line=$(tty); while ... } < /etc/terms;

Configuration files for system administration tasks like this one are actually fairly common; a prominent example is /etc/hosts, which lists machines that are accessible in a TCP/IP network. We can make /etc/terms more like these standard files by allowing comment lines in the file that start with #, just as in shell scripts. This way /etc/terms can look like this:

#
# System Console is console
console        console
#
# Cameron's line has a Wyse 60
tty01        wy60
...

We can handle comment lines by modifying the while loop so that it ignores lines beginning with #. We can place a grep in the test:

if [ -z "$(echo $dev | grep ^#)" ]  && [ $dev = $line ]; then
    ...

As we saw in Chapter 5, the && combines the two conditions so that both must be true for the entire condition to be true.

As another example of command blocks, consider the case of creating a standard algebraic notation frontend to the dc command. dc is a UNIX utility that simulates a Reverse Polish Notation (RPN) calculator:[5]

{ while read line; do
    echo "$(alg2rpn $line)"
  done
} | dc

We’ll assume that the actual conversion from one notation to the other is handled by a function called alg2rpn. It takes a line of standard algebraic notation as an argument and prints the RPN equivalent on the standard output. The while loop reads lines and passes them through the conversion function, until an EOF is typed. Everything is executed inside the command block and the output is piped to the dc command for evaluation.

Reading user input

The other type of task to which read is suited is prompting a user for input. Think about it: we have hardly seen any such scripts so far in this book. In fact, the only ones were the modified solutions to Task 5-4, which involved select.

As you’ve probably figured out, read can be used to get user input into shell variables.

We can use echo to prompt the user, like this:

echo -n 'terminal? '
read TERM
echo "TERM is $TERM"

Here is what this looks like when it runs:

terminal? wy60TERM is wy60

However, shell convention dictates that prompts should go to standard error, not standard output. (Recall that select prompts to standard error.) We could just use file descriptor 2 with the output redirector we saw earlier in this chapter:

echo -n 'terminal? ' >&2
read TERM
echo TERM is $TERM

We’ll now look at a more complex example by showing how Task 5-5 would be done if select didn’t exist. Compare this with the code in Chapter 5:

echo 'Select a directory:'
done=false
     
while [ $done = false ]; do
    do=true
    num=1
    for direc in $DIR_STACK; do
        echo $num) $direc 
        num=$((num+1))
    done
    echo -n 'directory? '
    read REPLY
     
    if [ $REPLY -lt $num ] && [ $REPLY -gt 0 ]; then
        set - $DIR_STACK
     
        #statements that manipulate the stack...
     
        break
    else
        echo 'invalid selection.'
    fi
done

The while loop is necessary so that the code repeats if the user makes an invalid choice. select includes the ability to construct multicolumn menus if there are many choices, and better handling of null user input.

Before leaving read, we should note that it has eight options: -a, -d, -e, -n, -p, -r, -t, and -s.[6] The first of these options allows you to read values into an array. Each successive item read in is assigned to the given array starting at index 0. For example:

$ read -a people
alice duchess dodo
$ echo ${people[2]}
dodo
$

In this case, the array people now contains the items alice, duchess, and dodo.

A delimiter can be specified with the -d option. This will read a line up until the first character of the delimiter is reached. For example:

$ read -s stop aline
alice duches$
$ echo $aline
alice duche
$

The option -e can be used only with scripts run from interactive shells. It causes readline to be used to gather the input line, which means that you can use any of the readline editing features that we looked at in Chapter 2.

The -n option specifies how many characters will be read by read. For example, if we specify that it should read only ten characters in then it will return after reading that many:

$ read -n 10 aline
abcdefghij$
$ echo $aline
abcdefghij
$

The -p option followed by a string argument prints the string before reading input. We could have used this in the earlier examples of read, where we printed out a prompt before doing the read. For example, the directory selection script could have used read -p `directory?' REPLY.

read lets you input lines that are longer than the width of your display by providing a backslash () as a continuation character, just as in shell scripts. The -r option overrides this, in case your script reads from a file that may contain lines that happen to end in backslashes. read -r also preserves any other escape sequences the input might contain. For example, if the file hatter contains this line:

A line with a
 escape sequence

Then read -r aline will include the backslash in the variable aline, whereas without the -r, read will “eat” the backslash. As a result:

$ read -r aline < hatter$ echo -e "$aline"
A line with a
 escape sequence
$

However:

$ read aline < hatter$ echo -e "$aline"
A line with an escape sequence
$

The -s option forces read to not echo the characters that are typed to the terminal. This can be useful in cases where a shell may want to take single keystroke commands without displaying the typed characters on the terminal (e.g., moving something around with the arrow keys). In this case it could be combined with the -n option to read a single character each time in a loop: read -s -n1 key

The last option, -t, allows a time in seconds to be specified. read will wait the specified time for input and then finish. This is useful if you want a script to wait for input but continue processing if nothing is supplied.

Command-Line Processing

We’ve seen how the shell uses read to process input lines: it deals with single quotes (`'), double quotes (“”), and backslashes (); it separates lines into words, according to delimiters in the environment variable IFS; and it assigns the words to shell variables. We can think of this process as a subset of the things the shell does when processing command lines.

We’ve touched upon command-line processing throughout this book; now is a good time to make the whole thing explicit. Each line that the shell reads from the standard input or a script is called a pipeline; it contains one or more commands separated by zero or more pipe characters (|). For each pipeline it reads, the shell breaks it up into commands, sets up the I/O for the pipeline, then does the following for each command (Figure 7-1):

Steps in command-line processing

Figure 7-1. Steps in command-line processing

  1. Splits the command into tokens that are separated by the fixed set of metacharacters: SPACE, TAB, NEWLINE, ;, (, ), <, >, |, and &. Types of tokens include words, keywords, I/O redirectors, and semicolons.

  2. Checks the first token of each command to see if it is a keyword with no quotes or backslashes. If it’s an opening keyword, such as if and other control-structure openers, function, {, or (, then the command is actually a compound command. The shell sets things up internally for the compound command, reads the next command, and starts the process again. If the keyword isn’t a compound command opener (e.g., is a control-structure “middle” like then, else, or do, an “end” like fi or done, or a logical operator), the shell signals a syntax error.

  3. Checks the first word of each command against the list of aliases. If a match is found, it substitutes the alias’s definition and goes back to Step 1; otherwise, it goes on to Step 4. This scheme allows recursive aliases (see Chapter 3). It also allows aliases for keywords to be defined, e.g., alias aslongas=while or alias procedure=function.

  4. Performs brace expansion. For example, a{b,c} becomes ab ac.

  5. Substitutes the user’s home directory ($HOME) for tilde if it is at the beginning of a word. Substitutes user’s home directory for ~user.[7]

  6. Performs parameter (variable) substitution for any expression that starts with a dollar sign ($).

  7. Does command substitution for any expression of the form $(string).

  8. Evaluates arithmetic expressions of the form $((string)).

  9. Takes the parts of the line that resulted from parameter, command, and arithmetic substitution and splits them into words again. This time it uses the characters in $IFS as delimiters instead of the set of metacharacters in Step 1.

  10. Performs pathname expansion, a.k.a. wildcard expansion, for any occurrences of *, ?, and [/] pairs.

  11. Uses the first word as a command by looking up its source according to the rest of the list in Chapter 4, i.e., as a function command, then as a built-in, then as a file in any of the directories in $PATH.

  12. Runs the command after setting up I/O redirection and other such things.

That’s a lot of steps—and it’s not even the whole story! But before we go on, an example should make this process clearer. Assume that the following command has been run:

alias ll="ls -l"

Further assume that a file exists called .hist537 in user alice’s home directory, which is /home/alice, and that there is a double-dollar-sign variable $$ whose value is 2537 (we’ll see what this special variable is in the next chapter).

Now let’s see how the shell processes the following command:

ll $(type -path cc) ~alice/.*$(($$%1000))

Here is what happens to this line:

  1. ll $(type -path cc) ~alice/.*$(($$%1000)) splits the input into words.

  2. ll is not a keyword, so Step 2 does nothing.

  3. ls -l $(type -path cc) ~alice/.*$(($$%1000)) substitutes ls -l for its alias “ll”. The shell then repeats Steps 1 through 3; Step 2 splits the ls -l into two words.

  4. ls -l $(type -path cc) ~alice/.*$(($$%1000)) does nothing.

  5. ls -l $(type -path cc) /home/alice/.*$(($$%1000)) expands ~alice into /home/alice.

  6. ls -l $(type -path cc) /home/alice/.*$((2537%1000)) substitutes 2537 for $$.

  7. ls -l /usr/bin/cc /home/alice/.*$((2537%1000)) does command substitution on “type -path cc”.

  8. ls -l /usr/bin/cc /home/alice/.*537 evaluates the arithmetic expression 2537%1000.

  9. ls -l /usr/bin/cc /home/alice/.*537 does nothing.

  10. ls -l /usr/bin/cc /home/alice/.hist537 substitutes the filename for the wildcard expression .*537.

  11. The command ls is found in /usr/bin.

  12. /usr/bin/ls is run with the option -l and the two arguments.

Although this list of steps is fairly straightforward, it is not the whole story. There are still five ways to modify the process: quoting; using command, builtin, or enable; and using the advanced command eval.

Quoting

You can think of quoting as a way of getting the shell to skip some of the 12 steps above. In particular:

  • Single quotes (`') bypass everything through Step 10—including aliasing. All characters inside a pair of single quotes are untouched. You can’t have single quotes inside single quotes—not even if you precede them with backslashes.[8]

  • Double quotes (“”) bypass Steps 1 through 4, plus steps 9 and 10. That is, they ignore pipe characters, aliases, tilde substitution, wildcard expansion, and splitting into words via delimiters (e.g., blanks) inside the double quotes. Single quotes inside double quotes have no effect. But double quotes do allow parameter substitution, command substitution, and arithmetic expression evaluation. You can include a double quote inside a double-quoted string by preceding it with a backslash (). You must also backslash-escape $, ` (the archaic command substitution delimiter), and itself.

Table 7-7 has simple examples to show how these work; they assume the statement person=hatter was run and user alice’s home directory is /home/alice.

If you are wondering whether to use single or double quotes in a particular shell programming situation, it is safest to use single quotes unless you specifically need parameter, command, or arithmetic substitution.

Table 7-7. Examples of quoting rules

Expression

Value

$person

hatter

“$person”

hatter

$person

$person

`$person’

$person

“‘$person’”

‘hatter’

~alice

/home/alice

“~alice”

~alice

`~alice’

~alice

command, builtin, and enable

Before moving on to the last part of the command-line processing cycle, we’ll take a look at the command lookup order that we touched on in Chapter 4 and how it can be altered with several shell built-ins.

The default order for command lookup is functions, followed by built-ins, with scripts and executables last. There are three built-ins that you can use to override this order: command, builtin, and enable.

command removes alias and function lookup.[9] Only built-ins and commands found in the search path are executed. This is useful if you want to create functions that have the same name as a shell built-in or a command in the search path and you need to call the original command from the function. For instance, we might want to create a function called cd that replaces the standard cd command with one that does some fancy things and then executes the built-in cd:

cd ( )
{
    #Some fancy things
    command cd
}

In this case we avoid plunging the function into a recursive loop by placing command in front of cd. This ensures that the built-in cd is called and not the function.

command has some options, listed in Table 7-8.

Table 7-8. command options

Option

Description

-p

Uses a default value for PATH

-v

Prints the command or pathname used to invoke the command

-V

A more verbose description than with -v

-

Turns off further option checking

The -p option is a default path which guarantees that the command lookup will find all of the standard UNIX utilities. In this case, command will ignore the directories in your PATH.[10]

builtin is very similar to command but is more restrictive. It looks up only built-in commands, ignoring functions and commands found in PATH. We could have replaced command with builtin in the cd example above.

The last command enables and disables shell built-ins—it is called enable. Disabling a built-in allows a shell script or executable of the same name to be run without giving a full pathname. Consider the problem many beginning UNIX shell programmers have when they name a script test. Much to their surprise, executing test usually results in nothing, because the shell is executing the built-in test, rather than the shell script. Disabling the built-in with enable overcomes this.[11]

Table 7-9 lists the options available with enable.[12] Some options are for working with dynamically loadable built-ins. See Appendix C for details on these options, and how to create and load your own built-in commands.

Table 7-9. enable options

Option

Description

-a

Displays every built-in and whether it is enabled or not

-d

Deletes a built-in loaded with -f

-f filename

Loads a new built-in from the shared-object filename

-n

Disables a built-in or displays a list of disabled built-ins

-p

Displays a list of all of the built-ins

-s

Restricts the output to POSIX “special” built-ins

Of these options, -n is the most useful; it is used to disable a built-in. enable without an option enables a built-in. More than one built-in can be given as arguments to enable, so enable -n pushd popd dirs would disable the pushd, popd, and dirs built-ins.[13]

You can find out what built-ins are currently enabled and disabled by using the command on its own, or with the -p option; enable or enable -p will list all enabled built-ins, and enable -n will list all disabled built-ins. To get a complete list with their current status, you can use enable -a.

The -s option restricts the output to POSIX `special’ built-ins. These are :, ., source, break, continue, eval, exec, exit, export, readonly, return, set, shift, trap, and unset.

eval

We have seen that quoting lets you skip steps in command-line processing. Then there’s the eval command, which lets you go through the process again. Performing command-line processing twice may seem strange, but it’s actually very powerful: it lets you write scripts that create command strings on the fly and then pass them to the shell for execution. This means that you can give scripts “intelligence” to modify their own behavior as they are running.

The eval statement tells the shell to take eval’s arguments and run them through the command-line processing steps all over again. To help you understand the implications of eval, we’ll start with a trivial example and work our way up to a situation in which we’re constructing and running commands on the fly.

eval ls passes the string ls to the shell to execute; the shell prints a list of files in the current directory. Very simple; there is nothing about the string ls that needs to be sent through the command-processing steps twice. But consider this:

listpage="ls | more"
$listpage

Instead of producing a paginated file listing, the shell will treat | and more as arguments to ls, and ls will complain that no files of those names exist. Why? Because the pipe character “appears” in Step 6 when the shell evaluates the variable, after it has actually looked for pipe characters. The variable’s expansion isn’t even parsed until Step 9. As a result, the shell will treat | and more as arguments to ls, so that ls will try to find files called | and more in the current directory!

Now consider eval $listpage instead of just $listpage. When the shell gets to the last step, it will run the command eval with arguments ls, |, and more. This causes the shell to go back to Step 1 with a line that consists of these arguments. It finds | in Step 2 and splits the line into two commands, ls and more. Each command is processed in the normal (and in both cases trivial) way. The result is a paginated list of the files in your current directory.

Now you may start to see how powerful eval can be. It is an advanced feature that requires considerable programming cleverness to be used most effectively. It even has a bit of the flavor of artificial intelligence, in that it enables you to write programs that can “write” and execute other programs.[14] You probably won’t use eval for everyday shell programming, but it’s worth taking the time to understand what it can do.

As a more interesting example, we’ll revisit Task 4-1, the very first task in the book. In it, we constructed a simple pipeline that sorts a file and prints out the first N lines, where N defaults to 10. The resulting pipeline was:

sort -nr $1 | head -${2:-10}

The first argument specified the file to sort; $2 is the number of lines to print.

Now suppose we change the task just a bit so that the default is to print the entire file instead of 10 lines. This means that we don’t want to use head at all in the default case. We could do this in the following way:

if [ -n "$2" ]; then
    sort -nr $1 | head -$2
else
    sort -nr $1
fi

In other words, we decide which pipeline to run according to whether $2 is null. But here is a more compact solution:

eval sort -nr $1 ${2:+"| head -$2"}

The last expression in this line evaluates to the string | head -$2 if $2 exists (is not null); if $2 is null, then the expression is null too. We backslash-escape dollar signs ($) before variable names to prevent unpredictable results if the variables’ values contain special characters like > or |. The backslash effectively puts off the variables’ evaluation until the eval command itself runs. So the entire line is either:

eval sort -nr $1 | head -$2

if $2 is given, or:

eval sort -nr $1

if $2 is null. Once again, we can’t just run this command without eval because the pipe is “uncovered” after the shell tries to break the line up into commands. eval causes the shell to run the correct pipeline when $2 is given.

Next, we’ll revisit Task 7-2 from earlier in this chapter, the start script that lets you start a command in the background and save its standard output and standard error in a logfile. Recall that the one-line solution to this task had the restriction that the command could not contain output redirectors or pipes. Although the former doesn’t make sense when you think about it, you certainly would want the ability to start a pipeline in this way.

eval is the obvious way to solve this problem:

eval "$@" > logfile 2>&1 &

The only restriction that this imposes on the user is that pipes and other such special characters be quoted (surrounded by quotes or preceded by backslashes).

Here’s a way to apply eval in conjunction with various other interesting shell programming concepts.

make is known primarily as a programmer’s tool, but it seems as though someone finds a new use for it every day. Without going into too much extraneous detail, make basically keeps track of multiple files in a particular project, some of which depend on others (e.g., a document depends on its word processor input file(s)). It makes sure that when you change a file, all of the other files that depend on it are processed.

For example, assume you’re using the troff word processor to write a book. You have files for the book’s chapters called ch1.t, ch2.t, and so on; the troff output for these files are ch1.out, ch2.out, etc. You run commands like troff ch N .t > ch N .out to do the processing. While you’re working on the book, you tend to make changes to several files at a time.

In this situation, you can use make to keep track of which files need to be reprocessed, so that all you need to do is type make, and it will figure out what needs to be done. You don’t need to remember to reprocess the files that have changed.

How does make do this? Simple: it compares the modification times of the input and output files (called sources and targets in make terminology), and if the input file is newer, then make reprocesses it.

You tell make which files to check by building a file called makefile that has constructs like this:

               target : source1 source2 ...
                 commands to make target

This essentially says, “For target to be up to date, it must be newer than all of the sources. If it’s not, run the commands to bring it up to date.” The commands are on one or more lines that must start with TABs: e.g., to make ch7.out:

ch7.out : ch7.t
          troff ch7.t > ch7.out

Now suppose that we write a shell function called makecmd that reads and executes a single construct of this form. Assume that the makefile is read from standard input. The function would look like the following code.

makecmd ( )
{
    read target colon sources
    for src in $sources; do
        if [ $src -nt $target ]; then
            while read cmd && [ $(grep 	* $cmd) ]; do
                echo "$cmd"
                eval ${cmd#	}
            done
            break
        fi
    done
}

This function reads the line with the target and sources; the variable colon is just a placeholder for the :. Then it checks each source to see if it’s newer than the target, using the -nt file attribute test operator that we saw in Chapter 5. If the source is newer, it reads, prints, and executes the commands until it finds a line that doesn’t start with a TAB or it reaches end-of-file. (The real make does more than this; see the exercises at the end of this chapter.) After running the commands (which are stripped of the initial TAB), it breaks out of the for loop, so that it doesn’t run the commands more than once.

As a final example of eval, we’ll look again at procimage, the graphics utility that we developed in the last three chapters. Recall that one of the problems with the script as it stands is that it performs the process of scaling and bordering regardless of whether you want them. If no command-line options are present, a default size, border width, and border color are used. Rather than invent some if then logic to get around this, we’ll look at how you can dynamically build a pipeline of commands in the script; those commands that aren’t needed simply disappear when the time comes to execute them. As an added bonus, we’ll add another capability to our script: image enhancement.

Looking at the procimage script you’ll notice that the NetPBM commands form a nice pipeline; the output of one operation becomes the input to the next, until we end up with the final image. If it weren’t for having to use a particular conversion utility, we could reduce the script to the following pipeline (ignoring options for now):

cat $filename | convertimage | pnmscale | pnmmargin |
    pnmtojpeg > $outfile

Or, better yet:

convertimage $filename | pnmscale | pnmmargin | pnmtojpeg 
    > $outfile

As we’ve already seen, this is equivalent to:

eval convertimage $filename | pnmscale | pnmmargin |
 pnmtojpeg > $outfile

And knowing what we do about how eval operates, we can transform this into:

eval "convertimage" $filename " | pnmscale" " | pnmmargin" 
    " | pnmtojpeg " > $outfile

And thence to:

convert='convertimage'
scale=' | pnmscale'
border=' | pnmmargin'
standardise=' | pnmtojpeg
     
eval $convert $filename $scale $border $standardise > $outfile

Now consider what happens when we don’t want to scale the image. We do this:

scale=""
     
while getopts ":s:w:c:" opt; do
    case $opt in
      s  ) scale=' | pnmscale' ;;
     
 ...
     
eval $convert $filename $scale $border $standardise > $outfile

In this code fragment, scale is set to a default of the empty string. If -s is not given on the command line, then the final line evaluates with $scale as the empty string and the pipeline will “collapse” into:

$convert $filename $border $standardise > $outfile

Using this principle, we can modify the previous version of the procimage script and produce a pipeline version. For each input file we need to construct and run a pipeline based upon the options given on the command line. Here is the new version:

# Set up the defaults
width=1
colour='-color grey'
usage="Usage: $0 [-s N] [-w N] [-c S] imagefile..."

# Initialise the pipeline components
standardise=' | pnmtojpeg -quiet'

while getopts ":s:w:c:" opt; do
    case $opt in
      s  ) size=$OPTARG
           scale=' | pnmscale -quiet -xysize $size $size' ;;
      w  ) width=$OPTARG
           border=' | pnmmargin $colour $width' ;;
      c  ) colour="-color $OPTARG"
           border=' | pnmmargin $colour $width' ;;
      ? ) echo $usage
           exit 1 ;;
    esac
done

shift $(($OPTIND - 1))

if [ -z "$@" ]; then
    echo $usage
    exit 1
fi

# Process the input files
for filename in "$@"; do
    case $filename in
        *.gif ) convert='giftopnm'  ;;

        *.tga ) convert='tgatoppm'  ;;

        *.xpm ) convert='xpmtoppm'  ;;

        *.pcx ) convert='pcxtoppm'  ;;

        *.tif ) convert='tifftopnm'  ;;

        *.jpg ) convert='jpegtopnm -quiet' ;;

            * ) echo "$0: Unknown filetype '${filename##*.}'"
                exit 1;;
    esac

    outfile=${filename%.*}.new.jpg

    eval $convert $filename $scale $border $standardise > $outfile

done

This version has been simplified somewhat from the previous one in that it no longer needs a temporary file to hold the converted file. It is also a lot easier to read and understand. To show how easy it is to add further processing to the script, we’ll now add one more NetPBM utility.

NetPBM provides a utility to enhance an image and make it sharper: pnmnlfilt. This utility is an image filter that samples the image and can enhance edges in the image (it can also smooth the image if given the appropriate values). It takes two parameters that tell it how much to enhance the image. For the purposes of our script, we’ll just choose some optimal values and provide an option to switch enhancement on and off in the script.

To put the new capability in place all we have to do is add the new option (-S) to the getopts case statement, update the usage line, and add a new variable to the pipeline. Here is the new code:

# Set up the defaults
width=1
colour='-color grey'
usage="Usage: $0 [-S] [-s N] [-w N] [-c S] imagefile..."

# Initialise the pipeline components
standardise=' | pnmtojpeg -quiet'

while getopts ":Ss:w:c:" opt; do
    case $opt in
      S  ) sharpness=' | pnmnlfilt -0.7 0.45' ;;
      s  ) size=$OPTARG
           scale=' | pnmscale -quiet -xysize $size $size' ;;
      w  ) width=$OPTARG
           border=' | pnmmargin $colour $width' ;;
      c  ) colour="-color $OPTARG"
           border=' | pnmmargin $colour $width' ;;
      ? ) echo $usage
           exit 1 ;;
    esac
done

shift $(($OPTIND - 1))

if [ -z "$@" ]; then
    echo $usage
    exit 1
fi

# Process the input files
for filename in "$@"; do
    case $filename in
        *.gif ) convert='giftopnm'  ;;

        *.tga ) convert='tgatoppm'  ;;

        *.xpm ) convert='xpmtoppm'  ;;

        *.pcx ) convert='pcxtoppm'  ;;

        *.tif ) convert='tifftopnm'  ;;

        *.jpg ) convert='jpegtopnm -quiet' ;;

            * ) echo "$0: Unknown filetype '${filename##*.}'"
                exit 1;;
    esac

    outfile=${filename%.*}.new.jpg

    eval $convert $filename $scale $border $sharpness $standardise > $outfile

done

We could go on forever with increasingly complex examples of eval, but we’ll settle for concluding the chapter with a few exercises. The questions in Exercise 3 are really more like items on the menu of food for thought.

  1. Here are a couple of ways to enhance procimage, the graphics utility:

    1. Add an option, -q, that allows the user to turn on and off the printing of diagnostic information from the NetPBM utilities. You’ll need to map -q to the -quiet option of the utilities. Also, add your own diagnostic output for those utilities that don’t print anything, e.g., the format conversions.

    2. Add an option that allows the user to specify the order that the NetPBM processes take place, i.e., whether enhancing the image comes before bordering, or bordering comes before resizing. Rather than using an if construct to make the choice amongst hard-coded orders, construct a string dynamically which will look similar to this:

      "eval $convert $filename $scale $border $sharpness
          $standardise > $outfile"
    3. You’ll then need eval to evaluate this string.

  2. The function makecmd in the solution to Task 7-3 represents an oversimplification of the real make’s functionality. make actually checks file dependencies recursively, meaning that a source on one line in a makefile can be a target on another line. For example, the book chapters in the example could themselves depend on some figures in separate files that were made with a graphics package.

    1. Write a function called readtargets that goes through the makefile and stores all of the targets in a variable or temporary file.

    2. makecmd merely checks to see if any of the sources are newer than the given target. It should really be a recursive routine that looks like this:

      function makecmd ( )
      {
          target=$1
          get sources for $target
          for each source src; do
              if $src is also a target in this makefile then
                  makecmd $src
              fi
              if [ $src -nt $target ]; then
                  run commands to make target
                  return
              fi
          done
      }
    3. Implement this.

    4. Write the “driver” script that turns the makecmd function into a full make program. This should make the target given as argument, or if none is given, the first target listed in the makefile.

    5. The above makecmd still doesn’t do one important thing that the real make does: allow for “symbolic” targets that aren’t files. These give make much of the power that makes it applicable to such an incredible variety of situations. Symbolic targets always have a modification time of 0, so that make always runs the commands to make them. Modify makecmd so that it allows for symbolic targets. (Hint: the crux of this problem is to figure out how to get a file’s modification time. This is quite difficult.)

  3. Here are some problems that really test your knowledge of eval and the shell’s command-line processing rules. Solve these and you’re a true bash hacker!

    1. Advanced shell programmers sometimes use a little trick that includes eval: using the value of a variable as the name of another variable. In other words, you can give a shell script control over the names of variables to which it assigns values. The latest version of bash has this built in in the form of ${! varname}, where varname contains the name of another variable that will be the target of the operation. This is known as indirect expansion. How would you do this only using eval? (Hint: if $object equals “person”, and $person is “alice”, then you might think that you could type echo $$object and get the response alice. This doesn’t actually work, but it’s on the right track.)

    2. You could use the above technique together with other eval tricks to implement new control structures for the shell. For example, see if you can write a script that emulates the behavior of a for loop in a conventional language like C or Pascal, i.e., a loop that iterates a fixed number of times, with a loop variable that steps from 1 to the number of iterations (or, for C fans, 0 to iterations-1). Call your script loop to avoid clashes with the keywords for and do.

    3. The pushd, popd, and dirs functions that we built up in previous chapters can’t handle directories with spaces in their names (because DIR_STACK uses a space as a delimiter). Use eval to overcome this limitation. (Hint: use eval to implement an array. Each array element is called array1, array2, ... arrayn, and each array element contains a directory name.)

    4. (The following doesn’t have that much to do with the material in this chapter per se, but it is a classic programming exercise:) Write the function alg2rpn used in the section on command blocks. Here’s how to do this: Arithmetic expressions in algebraic notation have the form expr op expr, where each expr is either a number or another expression (perhaps in parentheses), and op is +, -, x, /, or % (remainder). In RPN, expressions have the form expr expr op. For example: the algebraic expression 2+3 is 2 3 + in RPN; the RPN equivalent of (2+3) x (9-5) is 2 3 + 9 5 - x. The main advantage of RPN is that it obviates the need for parentheses and operator precedence rules (e.g., x is evaluated before +). The dc program accepts standard RPN, but each expression should have “p” appended to it, which tells dc to print its result; e.g., the first example above should be given to dc as 2 3 + p.

    5. You need to write a routine that converts algebraic notation to RPN. This should be (or include) a function that calls itself (a recursive function) whenever it encounters a subexpression. It is especially important that this function keep track of where it is in the input string and how much of the string it “eats up” during its processing. (Hint: make use of the pattern-matching operators discussed in Chapter 4 to ease the task of parsing input strings.) To make your life easier, don’t worry about operator precedence for now; just convert to RPN from left to right: e.g., treat 3+4x5 as (3+4)x5 and 3x4+5 as (3x4)+5. This makes it possible for you to convert the input string on the fly, i.e., without having to read in the whole thing before doing any processing.

    6. Enhance your solution to the previous exercise so that it supports operator precedence in the “usual” order: x, /, % (remainder) +, -. For example, treat 3+4x5 as 3+(4x5) and 3x4+5 as (3x4)+5.

    7. Here is something else to really test your skills; write a graphics utility script, index, that takes a list of image files, reduces them in size and creates an “index” image. An index image is comprised of thumbnail-sized versions of the original images, placed neatly in columns and rows, and with a caption underneath (usually the name of the original file). Besides the list of files, you’ll need some options, including the number of columns to create and the size of the thumbnail images. You might also like to include an option to specify the gap between each image. The new NetPBM utilities you’ll need are pbmtext and pnmcat. You’ll also need pnmscale and one or more of the conversion utilities, depending upon whether you decide to take in various formats (as we did for procimage) and what output format you decide on. pbmtext takes as an argument some text and converts the text into a PNM bitmap. pnmcat is a little more complex. Like cat, it concatenates things; in this case, images. You can specify as many PNM files as you like as arguments and pnmcat will put them together into one long image. By using the -lr and -tb options, you can specify whether you want the images to be placed one after the other going from left to right, or from top to bottom. The first option to pnmcat is the background color. It can be either -black for a black background, or -white for a white background. We suggest -white to match the pbmtext black text on a white background. You’ll need to take each file, run the filename through pbmtext, and use pnmcat to place it underneath a scaled down version of the original image. Then you’ll need to continue doing this for each file and use pnmcat to connect them together. In addition, you’ll have to keep tabs on how many columns you have completed and when to start a new row. Note that you’ll need to build up the rows individually and use pnmcat to connect them together. pnmcat won’t do this for you automatically.



[1] Don’t put it past people to come up to your unattended terminal and cause mischief!

[2] You must use a double backslash if you don’t surround the string that contains them with quotes; otherwise, the shell itself “steals” a backslash before passing the arguments to echo.

[4] printf is not available in versions of bash prior to version 2.02.

[5] If you have ever owned a Hewlett-Packard calculator you will be familiar with RPN. We’ll discuss RPN further in one of the exercises at the end of this chapter.

[6] -a, -d, -e, -n, -p, -t and -s are not available in versions of bash prior to 2.0.

[7] Two obscure variations on this: the shell substitutes the current directory ($PWD) for ~+ and the previous directory ($OLDPWD) for ~-. In bash 2.0 there are two more: ~N+ and ~N-. These are replaced by the corresponding element in the directory stack as given by the dirs command.

[8] However, as we saw in Chapter 1, `'' (i.e., single quote, backslash, single quote, single quote) acts pretty much like a single quote in the middle of a single-quoted string; e.g., `abc`'`def' evaluates to abc`def.

[9] command removes alias lookup as a side effect. Because the first argument of command is no longer the first word that bash parses, it is not subjected to alias lookup.

[10] Unless bash has been compiled with a brain-dead value for the default. See Chapter 11 for how to change the default value.

[11] Note that the wrong test may still be run. If your current directory is the last in PATH you’ll probably execute the system file test. test is not a good name for a program.

[12] The -d, -f, -p, and -s options are not available in versions of bash prior to 2.0.

[13] Be careful—it is possible to disable enable (enable -n enable). There is a compile-time option that allows builtin to act as an escape-hatch. For more details, see Chapter 11.

[14] You could actually do this without eval, by echoing commands to a temporary file and then “sourcing” that file with . filename. But that is much less efficient.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.142.146