The past few chapters have gone into detail about various shell programming techniques, mostly focused on the flow of data and control through shell programs. In this chapter, we switch the focus to two related topics. The first is the shell’s mechanisms for doing file-oriented input and output. We present information that expands on what you already know about the shell’s basic I/O redirectors.
Second, we’ll “zoom in” and talk about I/O at the line and word level. This is a fundamentally different topic, since it involves moving information between the domains of files/terminals and shell variables. echo and command substitution are two ways of doing this that we’ve seen so far.
Our discussion of line and word I/O will lead into a more detailed explanation of how the shell processes command lines. This information is necessary so that you can understand exactly how the shell deals with quotation, and so that you can appreciate the power of an advanced command called eval, which we will cover at the end of the chapter.
In Chapter 1, you learned about the shell’s basic I/O redirectors: >, <, and |. Although these are enough to get you through 95% of your UNIX life, you should know that bash supports many other redirectors. Table 7-1 lists them, including the three we’ve already seen. Although some of the rest are broadly useful, others are mainly for systems programmers.
Table 7-1. I/O redirectors
Notice that some of the redirectors in Table 7-1 contain a digit n, and that their descriptions contain the term file descriptor; we’ll cover that in a little while.
The first two new redirectors, >> and >|, are simple variations on the standard output redirector >. The >> appends to the output file (instead of overwriting it) if it already exists; otherwise it acts exactly like >. A common use of >> is for adding a line to an initialization file (such as .bashrc or .mailrc) when you don’t want to bother with a text editor. For example:
$ cat >> .bashrc
alias cdmnt='mount -t iso9660 /dev/sbpcd /cdrom'
^D
As we saw in Chapter 1, cat without an argument uses standard input as its input. This allows you to type the input and end it with CTRL-D on its own line. The alias line will be appended to the file .bashrc if it already exists; if it doesn’t, the file is created with that one line.
Recall from Chapter 3, that you can prevent the shell from overwriting a file with > file by typing set -o noclobber. >| overrides noclobber—it’s the “Do it anyway, dammit!” redirector.
The redirector <> is mainly meant for use with device files (in the /dev directory), i.e., files that correspond to hardware devices such as terminals and communication lines. Low-level systems programmers can use it to test device drivers; otherwise, it’s not very useful.
The rest of the redirectors will only be useful in special situations and you are unlikely to need them most of the time.
The << label redirector essentially forces the input to a command to be the shell’s standard input, which is read until there is a line that contains only label. The input in between is called a here-document. Here-documents aren’t very interesting when used from the command prompt. In fact, it’s the same as the normal use of standard input except for the label. We could use a here-document to simulate the mail facility. When you send a message to someone with the mail utility, you end the message with a dot (.). The body of the message is saved in a file, msgfile:
$ cat >> msgfile << .
> this is the text of
> our message.
> .
Here-documents are meant to be used from within shell scripts; they let you specify “batch” input to programs. A common use of here-documents is with simple text editors like ed. Task 7-1 is a programming task that uses a here-document in this way.
We can use ed to delete the header lines. To do this, we need to know something about the syntax of mail messages; specifically, that there is always a blank line between the header lines and the message text. The ed command 1,/^[]*$/d does the trick: it means, “Delete from line 1 until the first blank line.” We also need the ed commands w (write the changed file) and q (quit). Here is the code that solves the task:
ed $1 << EOF 1,/^[ ]*$/d w q EOF
The shell does parameter (variable) substitution and command substitution on text in a here-document, meaning that you can use shell variables and commands to customize the text. A good example of this is the bashbug script, which sends a bug report to the bash maintainer (see Chapter 11). Here is a stripped-down version:
MACHINE="i586" OS="linux-gnu" CC="gcc" CFLAGS=" -DPROGRAM='bash' -DHOSTTYPE='i586' -DOSTYPE='linux-gnu' -DMACHTYPE='i586-pc-linux-gnu' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./lib -g -O2" RELEASE="2.01" PATCHLEVEL="0" RELSTATUS="release" MACHTYPE="i586-pc-linux-gnu" TEMP=/tmp/bbug.$$ case "$RELSTATUS" in alpha*|beta*) [email protected] ;; *) [email protected] ;; esac BUGADDR="${1-$BUGBASH}" UN= if (uname) >/dev/null 2>&1; then UN=`uname -a` fi cat > $TEMP <<EOF From: ${USER} To: ${BUGADDR} Subject: [50 character or so descriptive subject here (for reference)] Configuration Information [Automatically generated, do not change]: Machine: $MACHINE OS: $OS Compiler: $CC Compilation CFLAGS: $CFLAGS uname output: $UN Machine Type: $MACHTYPE bash Version: $RELEASE Patch Level: $PATCHLEVEL Release Status: $RELSTATUS Description: [Detailed description of the problem, suggestion, or complaint.] Repeat-By: [Describe the sequence of events that causes the problem to occur.] Fix: [Description of how to fix the problem. If you don't know a fix for the problem, don't include this section.] EOF vi $TEMP mail $BUGADDR < $TEMP
The first eight lines are generated when bashbug is installed. The shell will then substitute the appropriate values for the variables in the text whenever the script is run.
The redirector << has two variations. First, you can prevent the shell from doing parameter and command substitution by surrounding the label in single or double quotes. In the above example, if you used the line cat > $TEMP <<`EOF', then text like $USER and $MACHINE would remain untouched (defeating the purpose of this particular script).
The second variation is <<-, which deletes leading TABs (but not blanks) from the here-document and the label line. This allows you to indent the here-document’s text, making the shell script more readable:
cat > $TEMP <<-EOF From: ${USER} To: ${BUGADDR} Subject: [50 character or so descriptive subject here] Configuration Information [Automatically generated, do not change]: Machine: $MACHINE OS: $OS Compiler: $CC Compilation CFLAGS: $CFLAGS ... EOF
Make sure you are careful when choosing your label so that it doesn’t appear as an actual input line.
A slight variation on this is provided by the here string. It takes the form <<<word; the word is expanded and supplied on the standard input.
The next few redirectors in Table 7-1 depend on the notion of a file descriptor. Like the device files used with <>, this is a low-level UNIX I/O concept that is of interest only to systems programmers—and then only occasionally. You can get by with a few basic facts about them; for the whole story, look at the entries for read( ), write( ), fcntl( ), and others in Section 2 of the UNIX manual. You might wish to refer to UNIX Power Tools by Shelley Powers, Jerry Peek, Tim O’Reilly, and Mike Loukides (O’Reilly).
File descriptors are integers starting at 0 that refer to particular streams of data associated with a process. When a process starts, it usually has three file descriptors open. These correspond to the three standards: standard input (file descriptor 0), standard output (1), and standard error (2). If a process opens additional files for input or output, they are assigned to the next available file descriptors, starting with 3.
By far the most common use of file descriptors with bash is in saving standard error in a file. For example, if you want to save the error messages from a long job in a file so that they don’t scroll off the screen, append 2> file to your command. If you also want to save standard output, append > file1 2> file2.
This leads to another programming task.
We’ll call this script start. The code is very terse:
"$@" > logfile 2>&1 &
This line executes whatever command and parameters follow start. (The command cannot contain pipes or output redirectors.) It sends the command’s standard output to logfile.
Then, the redirector 2>&1 says, “send standard error (file descriptor 2) to the same place as standard output (file descriptor 1).” Since standard output is redirected to logfile, standard error will go there too. The final & puts the job in the background so that you get your shell prompt back.
As a small variation on this theme, we can send both standard output and standard error into a pipe instead of a file: command 2>&1 | ... does this. (Make sure you understand why.) Here is a script that sends both standard output and standard error to the logfile (as above) and to the terminal:
"$@" 2>&1 | tee logfile &
The command tee takes its standard input and copies it to standard output and the file given as argument.
These scripts have one shortcoming: you must remain logged in until the job completes. Although you can always type jobs (see Chapter 1) to check on progress, you can’t leave your terminal until the job finishes, unless you want to risk a breach of security.[1] We’ll see how to solve this problem in the next chapter.
The other file-descriptor-oriented redirectors (e.g., <& n) are usually used for reading input from (or writing output to) more than one file at the same time. We’ll see an example later in this chapter. Otherwise, they’re mainly meant for systems programmers, as are <&- (force standard input to close) and >&- (force standard output to close).
Before we leave this topic, we should just note that 1> is the same as >, and 0< is the same as <. If you understand this, then you probably know all you need to know about file descriptors.
Now we’ll zoom back in to the string I/O level and examine the echo and read statements, which give the shell I/O capabilities that are more analogous to those of conventional programming languages.
As we’ve seen countless times in this book, echo simply prints its arguments to standard output. Now we’ll explore the command in greater detail.
echo accepts a few dash options, listed in Table 7-2.
echo accepts a number of escape sequences that start with a backslash.[2] They are listed in Table 7-3.
These sequences exhibit fairly predictable behavior, except for f: on some displays, it causes a screen clear, while on others it causes a line feed. It ejects the page on most printers. v is somewhat obsolete; it usually causes a line feed.
Table 7-3. echo escape sequences
Sequence |
Character printed |
---|---|
a |
ALERT or CTRL-G (bell) |
BACKSPACE or CTRL-H | |
c |
Omit final NEWLINE |
e |
Escape character (same as E) |
E |
Escape character[3] |
f |
FORMFEED or CTRL-L |
|
NEWLINE (not at end of command) or CTRL-J |
|
RETURN (ENTER) or CTRL-M |
|
TAB or CTRL-I |
v |
VERTICAL TAB or CTRL-K |
n |
ASCII character with octal (base-8) value n, where n is 1 to 3 digits |