This chapter discusses the relationship between the shell and the programs it calls, with a particular focus on subshells—additional shells run by a shell script in a new process. This chapter also discusses shell context and the distinction between shell variables and environment variables.
This chapter relies more heavily than previous chapters on a firm understanding of the UNIX process model. (While Windows does not use this model, UNIX-like shell environments running on Windows tend to emulate it at least some.) UNIX systems can run multiple programs at once. In fact, not only can multiple programs be running at once, but multiple instances of a single program also can be running at once. Each instance of a running program is called a process and has a unique numeric process identifier, or pid. The pid of the shell is expanded in the shell parameter $$
. While a pid may be reused after a process has exited, a process keeps its assigned pid for its entire lifetime, and there can never be another process with the same pid during that lifetime. Each process has its own separate memory space, although in some cases processes may arrange to share memory. The ps
command gives a list of processes currently running. UNIX does not distinguish as some systems do between "applications" and other kinds of processes; all programs run the same way. Note that the output of the ps
command is nonportable; you cannot use it safely in a portable shell script, as the formatting of the display varies from one system to another, as do the options used to specify what to display. There is no useful portable subset. It is generally easy for humans to read, but not very useful to shell programmers.
The fundamental tool of UNIX process creation is the fork, in which a single process becomes two identical processes. In a lower-level language, such as C, this is done by using the UNIX system call fork()
. When a process invokes this fork()
successfully, the process is duplicated, and both processes then return from fork()
, differing only in the return status of the fork()
system call. In the original process (called the parent), the fork()
system call returns the pid of the child; in the child, the fork()
system call returns 0. Apart from that, each process has the exact same environment; the same objects are stored at the same addresses in memory, for instance. However, the child process has a distinct copy of these objects; modifications in the child have no effect on the parent. (The fact that two processes can have the same memory locations holding different values can be a bit of a surprise; each process has its own distinct mapping from memory addresses to physical memory.)
There is no UNIX system call to launch a new program as a subprocess. The fork()
system call does not launch a new program, but rather duplicates an already-running one. The exec()
system call (actually a family of related system calls) allows the replacement of the current process with a named program. Thus to spawn a new process, you first use fork()
, then in the child process use exec()
to launch the new command. The C library includes a wrapper function, system()
, to run a command as a subprocess; on UNIX systems, this function works by passing the provided command to the shell. There is no way to explicitly fork in a shell script; instead, you run commands, create pipelines, or run subshells. The shell offers common tasks built in terms of fork()
and exec()
, rather than giving direct access to the system calls.
In some cases, a process may have multiple simultaneous paths of execution, called threads. I mention these only to stress that the UNIX shell does not use threading; each process started by the shell is a fully separate process. Within a portable shell script, you generally do not need to even be aware of threading. If you do find yourself using the output of ps
, though, be aware that one of the least portable things is whether or not threads might show up in the output of ps
, possibly giving several lines of output for a single pid. Be cautious.
Threading is newer than the shell and is not all that heavily used in the basic UNIX environment. On UNIX systems, the cost of launching a new process is fairly low, so there is little incentive to avoid spawning new processes. One of the greatest challenges of shell programs that need to run on Windows systems in emulated UNIX-like environments is that process creation costs are extremely high on Windows. If you anticipate a need to run your code on Windows, you may want to pay extra attention to the cost of new processes; avoid anything that would imply a fork()
on UNIX, such as subshells or external commands, whenever you can.
All of this may seem rather complicated and even irrelevant, but the shell's behavior is closely tied to this underlying model. Whenever the shell runs any external command, it does so by this fork()
/exec()
pair. The one exception is the use of the exec
built-in command to replace the currently running shell with another program; in this case, the shell uses only the exec()
system call.
So far, the discussion of variables in this book has looked at how they are used within a shell script. Some variables are available not only to the shell, but also to any child process it starts. These variables are called environment variables, and the set of environment variables in a given process is called the environment of that process. Environment variables are available to any program, not just the current script. Any programming language used on UNIX-like systems will typically offer some way to access (and possibly modify) environment variables. Processes have additional state beyond their environment variables, such as the collection of open file descriptors or current working directory. I refer generally to the set of environment variables and other per-process state as the context of a process.
The set
built-in command, called without arguments, prints all shell variables, whether or not they are in the environment. The env
utility, called without arguments, prints its environment.
A common convention among shell programmers is to use capital letters exclusively in the names of environment variables (e.g., $PATH
) and use all lowercase names for unexported shell variables (e.g., $answer
). This is an excellent convention, and this book uses it. Many developers put all shell variables in all caps. However, because there is no reasonable portable way to determine whether a variable has been exported, it is generally better to use the former convention. Shell variable names should use underscores (_)
to separate words, not mixed capitals and lowercase letters. (The shell doesn't care, but future readers do.)
There are three primary changes you can make to the environment: You can add variables to it, remove variables from it, or modify variables in it.
Adding variables to the environment is sometimes called exporting them, probably because it is done using the export
command. The export
command adds its named arguments to the environment. As with assignment, you do not use a dollar sign ($)
to mark the names of the variables. For instance, the command export FOO
adds the variable FOO
to the environment. A common idiom is to assign a variable, and then immediately export it:
NAME=John
export NAME
Many recent shells allow variable assignments to be used on the export command line, providing an equivalent, but not fully portable, shorthand:
export NAME=John
If you are comfortable relying on POSIX shell features, you can use this, but it offers little advantage. There is no portable way to remove a variable from the environment. The unset
command removes a variable from both the environment and the current shell, but is not universally portable. For purposes of a shell script, it is typically enough to set a variable to an empty value, then make sure to use the colon (:
) variants of the shell's substitution rules, for instance, using ${foo:-bar}
instead of ${foo-bar}
. However, this still leaves an empty string in the environment. If you really need to remove environment variables, you will need to rely on POSIX shell features; consider using an execution preamble (see Chapter 7) and the unset
command. An unset variable that is later assigned a value does not become part of the environment without being exported again.
Environment variables are modified like any other variables, using the shell's assignment operator. You cannot portably check whether a variable has been exported; this is one of the reasons a naming convention is so useful.
The environment is passed to child processes, but there is no way for children to modify the environment of the parent process. For instance, the following script does not do what its author perhaps intended:
$ cat path.sh
#!/bin/sh
PATH=$PATH:/usr/local/bin
$ echo $PATH
/bin:/usr/bin
$ ./path.sh
$ echo $PATH
/bin:/usr/bin
The user probably expected the shell assignment in path.sh
to alter the PATH
variable. In fact, it did alter the PATH
variable in the new shell that ran the script; however, this had no effect on the shell that invoked the script. Ways to modify the shell's environment are discussed in the section, "Modifying the State of the Shell," later in this chapter.
Issues like this are extremely widespread. Many UNIX systems use startup scripts with names like /etc/rc
or /etc/rc.local
. While researching shell features, I stumbled across a fascinating discussion among users trying to get an environment variable set on their system at boot time so that all users would share it. Their discussion revolved around adding the variable setting to /etc/rc.local
, a file for local system administrator additions to the system's startup scripts. Here's how that system runs its rc.local
script, if it exists:
if [ -f /etc/rc.local ]; then
sh /etc/rc.local
fi
Since the rc.local
script was being run by a separate shell, the variables would not have propagated anyway. Of course, sometimes you do not want a chunk of code to be able to modify your environment; I suspect the preceding code was written with the conscious intent to prevent the local script from making changes to the environment of the parent script, which could have affected the rest of the boot process.
Many UNIX utilities rely on environment variables, so it is common to set variables to influence their behavior. This can lead to a cluttered environment in which future script code behaves unexpectedly because of values left in the environment. There are several ways to resolve this. Some scripts simply set an environment variable, run code depending on that setting, then unset it. This technique has a couple of flaws. One is that, if the variable had a previous value, it is lost. Another is that some scripts need to be portable to systems without unset
. What is needed is a way to restore the previous value. There are three options.
The first is to stash the value in a temporary variable. Save the old value, set the new one, then restore the previous value. As an example, running make
with a modified path might be implemented as follows:
save_PATH="$PATH"
PATH="/usr/local/bin:$PATH"
make
PATH="$save_PATH"
In this example, the make
command is run with the /usr/local/bin
directory in $PATH
, but the previous value of $PATH
is restored afterward. This works, and it may even be useful in the case where you want to run a number of commands with a temporary variable assignment. Saving previous values becomes more useful in cases where you need to change a value back and forth.
A particularly common case of this is using a similar idiom to change the $IFS
shell variable. You can iterate through $PATH
by setting $IFS
to : and using a command like for dir in $PATH
. However, you might want to restore the old value again occasionally during the loop:
save_IFS=$IFS
IFS=:
for dir in $PATH; do
IFS=$save_IFS
# now you can run commands with the normal value of $IFS restored
echo "$dir"
done
IFS=$save_IFS
The second way to get a temporary change to the environment is to use the external env
command. The env
command can modify its environment and then run another program. For instance, the following script has the same behavior as the previous example:
env PATH="/usr/local/bin:$PATH" make
This has two limitations; the first is that it can run only a single command and the second is that the command it runs must be an external program, not a shell builtin (see the "Shell Builtins" section later in the chapter for more information about builtins). One likely pitfall of this technique is that parameter substitution occurs in the calling shell, which means that it uses the existing value, not the value passed in:
X=yes
export X
env X=no echo $X
yes
Although the echo
command is run with the environment variable $X
set to no
, the argument passed to it is the already-substituted value from the parent shell. The command executed is echo yes
, and it does not matter what $X
is when this is executed. You can force the substitution to occur in the called program by using a shell with a quoted string argument:
X=yes
export X
env X=no sh -c 'echo $X'
no
The third technique for temporary variable assignments is to prefix a command with one or more variable assignments. This special syntax tells the shell to make an exported assignment only for the duration of a single command. A previous example is simplified a little further this way:
PATH="/usr/local/bin:$PATH" make
This syntax creates a temporary environment variable. The existing value (if any) of the variable assigned is not changed. If the variable assigned was not an environment variable before, it is not exported after the command runs, but only while the command is running. As with the env
technique, this works only for a single shell operation. The command must be a simple command or pipeline; you cannot use braces or parentheses to group commands used this way. As with the env
technique, the command is substituted, globbed, and subjected to field splitting before the variable assignments take effect. So, for instance, you cannot use the following to change $IFS:
IFS=: echo $PATH
This echo
command shows you $PATH
subject to field splitting using the previous value of $IFS
. The shell first substitutes and splits the arguments, then creates the environment (assigning the new value to $IFS
) and runs the echo
command. This technique has a portability limitation; it is not safe to use this with built-in commands, such as read
or cd
. In general, it is probable that a shell will keep any variable assignment made in that context. Modern (POSIX) shells will restore previous values if the built-in command is eval
or set
, but older shells may not. This topic is explored further in the section "The eval Command" later in this chapter.
The term subshell refers to a second instance of the shell program run under the control of an existing shell. A subshell is simply a shell context created by calling fork()
. The subshell does not need to load the shell's executable from disk, perform any kind of initialization, or otherwise do anything at all except execute a command or list of commands; typically, the commands have already been parsed for it by the calling shell. What this means is that, even though a subshell is another process, the performance penalty of launching one is much smaller than people typically expect for a new process (except on Windows, where it is still quite high). Subshells may be created explicitly or implicitly. When ()
is used to separate out a list, this creates a subshell. Commands in a pipeline typically run in subshells.
A subshell is a separate shell context, and like any child process, it cannot modify the state of the parent shell. Directory changes, variable assignments, and redirections within subshells do not affect the parent shell. This is often useful, and subshells are used to make temporary changes to the shell's environment or state. Note that although command-line variable assignments are temporary and do not affect the shell's environment permanently, they do not create an implicit subshell.
A subshell is not the same as running a new shell to execute a command. You can issue a command to the shell using the -c
command-line option or feed commands to another shell either through a script file or using a pipe to the shell's input. There are several major differences between an external shell and a subshell. A separate shell invocation parses the command (or commands) provided, performs word splitting, substitution, globbing, and so on. A subshell starts with material that has already been split into words but still performs substitution, globbing, and field splitting; it mostly executes the already-parsed material in a new process context. A separate invocation of the shell inherits environment variables but not unexported shell variables. By contrast, a subshell has all of the parent shell's variables accessible to it. As a special case of this, the subshell keeps the parent shell's value of the special shell parameter $$
. Finally, a separately invoked shell may (depending on the shell) run some standard initialization or startup scripts, which may cost substantial time or produce surprising behavior. For more information on shell startup, see the discussion of shell invocation in Chapter 6.
Subshells are used in a kind of substitution that I glossed over in the previous section on substitution: command substitution (also often called command expansion). In command substitution, the shell replaces a string of text with the output from running that string of text as a command. The command is run in a subshell, and any substitution or globbing occurs in the subshell, not in the parent shell.
The output of the subshell is treated the same way as the results of parameter substitution. For instance, the output is subject to field splitting and globbing (unless it is in a context, such as the control word for a case
statement, where these are not performed), and the substitution can be put in double quotes to prevent this. Standard error from the command is not included as part of this output; it goes to the shell's regular standard error unless explicitly redirected.
Just as pipes allow you to use the output of a program as input to another program, command substitution allows you to use the output of a program as arguments to another program. There are two crucial differences beyond the difference in how these are used. The first is that argument lists may have limited length, while pipes can consistently handle gigabytes of data. The second, closely related, is that commands in a pipeline run simultaneously, but when you use command substitution, the command being substituted must run completely before its output can be used.
The shell's original syntax for command substitution, which is still universally available, uses backticks (`
, also called backquotes) to delimit command substitutions, as in `command`
. The text of command
is executed in a subshell (which performs any substitutions or globbing), and the backticks and their contents are replaced with the output of command
. As an example of usage, you can extract the name of a file using expr
and store that name using command substitution:
filename=`expr "$file" : '.*/([^/]*)$')`
In most modern shells, another syntax for command substitution is $(
command
)
. Unfortunately, there are a few shells left where this is not portable; most notably, the Solaris and Irix /bin/sh
. For some scripts, you may prefer to use the older form, but you may also prefer to use a preamble to get your script into a more recent shell (see Chapter 7). In newer shells, the previous example could be rewritten as:
filename=$(expr "$file" : '.*/([^/]*)$')
This sets the variable filename
to the file name component of a longer path. The $()
syntax may be nested:
all_files=$(find $HOME -name $(expr "$file" : '.*/([^/]*)$'))
There is no easy way to nest command substitution using the backtick syntax. The reason is that backticks do not have distinct left and right forms, so the shell simply treats text up to the first backtick it encounters as being a single subshelled command. For instance, imagine that you were to try to perform the preceding find
assignment using backticks:
all_files=`find $HOME -name `expr "$file" : '.*/([^/]*)$'``
The shell sees an opening backtick, then reads until it finds another backtick. So the first command is find $HOME -name
. The expr
command (and its arguments) show up outside of backticks, and the two backticks at the end look like substitution of an empty command. So this is treated by the shell as though you had written the following (using the other syntax):
all_files=$(find $HOME -name )expr "$file" : '.*/([^/]*)$'$()
The results of the empty $()
construct are simply empty strings, and $(find $HOME -name )
also produces no output. (The error message about a missing argument to -name
goes to standard error). So after substitution of the commands, this becomes the following:
all_files=expr "$file" : '.*/([^/]*)$'
The net result is that the shell sets $all_files
to the string expr
and tries to execute $file
as a command with the remaining arguments you had meant for expr
as its arguments. On some shells, you can obtain the expected results by escaping the inner backticks:
all_files=`find $HOME -name `basename $file``
Now the parent shell sees escaped backticks, which do not end the command it is constructing, and it passes them into the child shell, which executes the subcommand as expected. This is hard to read, gets harder to read if you add more nesting, and is not completely portable. Do not do it. There is a much simpler solution:
file_name=`basename $file`
all_files=`find $HOME -name "$file_name"`
In this case, the output of the first command is used as an argument to the second. The complete list of files generated is assigned to the all_files
variable. The behavior of backslashes in backticks may not be consistent between shells; avoid it. Backslashes in $()
command substitution seem to be consistently passed unaltered to the subshell.
file_name=`expr "$file" : '.*/([^/]*)$'`
for path in `find $HOME -name "$filename"`; do
echo `expr "$path" : '(.*)/([^/]*)$'`
done
The command substitution's results are subject to field splitting, providing a list of files in $HOME
with the specified name. Note that this does not behave well if some of the file names have spaces in them. If you want to prevent field splitting, you can use backticks (or the $()
syntax) inside double quotes. If you do this, you have to escape any nested quotes.
The choice of which command substitution syntax to use is more complicated than some shell portability decisions. The $()
syntax is substantially better, except for the surprise of running into a system that doesn't support it. These issues are discussed more in Chapter 7's discussion of shell language portability. If you have other reasons to require a POSIX shell, I would recommend the $()
syntax, but it is probably not in and of itself enough justification to make the additional requirement.
In general, the best way to handle nested command substitution is not to use it; use temporary variables to hold intermediate results. Nesting of command substitution is a frequent source of confusion or bugs in shell scripts. Avoid it. By the way, while the $()
syntax is more robust in the face of nesting, it has its own limitations; some shells behave surprisingly if you try to use command substitution of shell code that has mismatched parentheses, such as a case
statement. (The workaround of using (pattern)
in case
statements is also nonportable.)
Subshells can be formed implicitly under several circumstances. The most important to know about for most scripts are pipelines and background tasks (background tasks are discussed in Chapter 6). In a pipeline, every command may be run in a subshell. There is no explicit ()
to indicate where the subshells go, but there will typically be one per command or possibly one for each command but the first or last. In a portable script, you must not assume that any command in a pipeline runs in the parent shell. A common idiom to allow you to use the output of a pipeline is to use a while
loop as the last command in the pipeline; you can then access the output of the pipeline within the loop, but be aware that changes to shell variables may not affect the parent shell. (Worse yet, they may affect the parent shell, so you should not casually assume you can overwrite variables the parent shell is using.)
Here's a script I wrote once with the intent that it would list the contents of all subdirectories of the current directory:
#!/bin/sh
ls | while read file
do
cd "$file"
ls
done
This script has a surprisingly high density of bugs for such a tiny program. In fact, the only time it will work is when it is in a completely empty directory. If $file
is not a directory, the cd
command prints an error message, and the script runs ls
in the current directory; this is probably not what I want. If $file
is a directory, the shell changes to that directory and lists its contents as expected. So what's the bug in that case? The shell never changes back to the parent directory, so the next cd
command will probably not work as expected. Finally, it is possible (and even common) that the ls
command is subject to aliases that could cause it to behave differently or to environment variables that set default options causing it to, for instance, emit output in color. You can avoid the aliases by specifying the path to ls
. The environment variables are harder to address; for more information on the portability problems such features can create, and how to avoid them, see the discussion of utility portability in Chapter 8.
There are a number of ways to address these issues. The first thing to do is distinguish between directories and files. In the case where $file
is a directory, I want to change to it, run ls
, and change back out.
#!/bin/sh
/bin/ls | while read file
do
if test -d "$file"; then
cd "$file"
ls
cd ..
fi
done
Now this will work in the most common cases. However, there is a new problem. If one of the directories in question has permissions such that cd "$file"
fails (or if the script writer made the extremely common mistake of not quoting $file
and one of the directories has spaces in its name), the cd ..
moves the script back up into the shell's parent directory, leaving the script once again behaving unexpectedly. You can resolve this at least in part by using &&
:
#!/bin/sh
/bin/ls | while read file
do
if test -d "$file"; then
cd "$file" &&
ls &&
cd ..
fi
done
This now works in most cases. The only case where it will fail is where you can change your working directory to a given directory, but ls
fails in it, and this is pretty uncommon. However, there's a much simpler way; you can use an explicit subshell:
#!/bin/sh
/bin/ls | while read file
do
if test -d "$file"; then
( cd "$file" && ls )
fi
done
Because the cd
command is now in a subshell, the parent shell doesn't have to do anything; it just keeps on executing in the directory it came from, rather than trying to figure out how to get back to the right directory. Note that, unlike the {}
command group, a subshell does not need a trailing semicolon. This is because the )
character is a metacharacter, which the shell recognizes unless it has been quoted, while }
is merely a very terse keyword.
Explicit subshells are often used simply to group commands; this may be inefficient on any system, but it is especially inefficient if you need to worry about portability to Windows. If all you need is to group a few commands together, use {}
.
Sometimes, it is desirable to change the environment of the current shell. Subshells are used to prevent changes to the child shell's context, especially the environment or current directory, from affecting the parent shell. However, sometimes you want precisely the opposite effect; you want to force something to have an effect on the parent shell. Many shell builtins exist to change the shell's state. You could not implement cd
as an external program in UNIX because it would only change its own directory. The shell offers three other ways to run chunks of shell code within the current shell's environment: shell functions, the eval
command, and the dot (.
) command.
There are two major reasons for some commands to be built into the shell. The first is simple performance; for instance, many modern shells implement test
as a built-in command so conditional operations do not require a process to be spawned. When a program is a builtin for this reason, it mostly matches the behavior of an existing program that is found in the file system. For instance, the built-in test
program can generally accept any standard arguments that /bin/test
would work with. While the external utility programs and the shell builtins may both provide extensions, the standardized part of their behavior is usually the same. On the other hand, the nonstandard behaviors may vary widely. There is more discussion of utility (and built-in command) portability in Chapter 8. In general, whether something is a builtin or not, you should be careful about relying on extensions.
The second reason for a command to be a builtin is that it has to modify the shell's context. For instance, the cd
command is a builtin because a program that changed its own working directory would be useless to the shell calling it. Commands that modify or view shell variables have to be builtins. The env
command is not a builtin because it does not view unexported shell variables, and because it never changes the caller's environment. By contrast, the set
command is a builtin. The set
command can display unexported shell variables or control shell options; both of these functions require it to run as part of the shell process.
Shell functions offer an interesting compromise between running within the shell's environment and creating a new environment. A shell function is a block of code that is assigned a name and can thereafter be used just like a builtin command. This section introduces the common and portable subset of what you can do with shell functions; there is a great deal of variance between shells. (Some rare shells lack functions entirely; use a preamble to get to a real shell on those systems.) Shell functions are defined with the following syntax:
name
() block
By convention, block
is nearly always a {}
-delimited list. However, you can use a ()
-delimited list, in which case the function's body runs in a subshell. The block should be one of these two lists; other options are not portable. For instance, you cannot use a plain pipeline or list as a function body using this syntax. Some shells offer other syntax for defining functions or even accept a plain pipeline as a function body. In many cases, shells that accept multiple ways to declare functions provide different semantics for different types of functions. The previous structure, whether with {}
or ()
for the body, is the only portable option.
Functions operate a little like separate scripts. For instance, during the execution of a function, the positional parameters refer to the function's arguments, not the calling script's positional parameters. ($0
may or may not be changed; do not rely on either behavior.) However, calling exit
within a function exits the whole script. If you wish to return early from a function, the special return
built-in command exits the current function with a specified return code; in portable scripts, this still has to be a small integer value, the same as any other exit status. Once the function completes, the positional parameters are restored. The function runs in the shell's environment, so code within the function can modify the shell's state; for instance, it can change the working directory or modify variables in the calling shell.
The name of a function may clash with the name of a variable; because of this, it may be beneficial to use a consistent prefix, such as func_
, on function names. Some shells distinguish between function names and variable names, but older shells may not.
If you want to return a more complicated value or a string, you can store the result in a shell variable or design your function to be used with command substitution. For a shell variable, I recommend the name $function_result
, as in the following example:
func_display_dpi () {
func_display_dpi_result=$(xdpyinfo | awk '/resolution:/ { print $2; exit }')
}
The typical result of this function (a string like 75×75
) would not be a possible return value in some shells, but it can be stored in a variable. Of course, it could also be simplified if the function just displays its output, and you use command substitution when calling it:
func_display_dpi () {
xdpyinfo | awk '/resolution:/ { print $2; exit }'
}
I tend to favor the command substitution path when defining functions with useful outputs. It is more terse and usually more idiomatic; on the other hand, each call to such a function has to be run in a subshell, which can impose performance costs. The uniquely named variable offers better performance in most cases. (Not in the preceding example, though, where there's a subshell anyway.)
In shells other than zsh
, redirections at the end of a function's definition are performed every time the function is called, but only for the duration of the function. For instance, the following script logs multiple lines to the /tmp/log
file:
func_log () {
echo $*
} >> /tmp/log
func_log hello
func_log goodbye
cat /tmp/log
hello
goodbye
Each invocation of the func_log
function results in output to /tmp/log;
note that >>
must be used, or each invocation of the function would truncate the file. Because the redirection affects the entire function body, individual statements within it do not need separate redirection. However, the shell's standard output is not redirected, so the cat
at the end displays the log file normally. This offers an interesting compromise between individual redirections and using exec
to redirect the whole shell. This technique may be better avoided if you may need to target a system where zsh
is otherwise the best POSIX-like shell available; it is also quirky enough that it may be better avoided if other people need to read your code—which they do.
While every modern shell provides some way to provide local variables within shell functions, there are differences between the shells, and no one method for doing this is portable. This is actually more frustrating than it would be if there were simply no way to do it at all in some shells. You can sometimes obtain results similar to local variables by using a couple of tricks.
One solution is to run a chunk of code that needs local variables in a subshell. Getting data out of such a function is hard; if you need results from it, you must use command substitution to obtain them. If your function uses a subshell, and then you always call it in another subshell for command substitution, Windows users will hate you.
Another option is to use shell variables with names that are unlikely to clash. For instance, you could extend the function
_result
idiom to other values you need during the execution of a function.
If you really need local variables, though, you can use a subshell for them. You can simply declare the function using a subshell as the function body; the subshell code can create or modify variables freely without worrying about affecting the parent shell environment. For instance, this script uses a subshell to avoid stomping on the parent shell's variable value
:
func_add () (
value=0
for i
do
value=$(expr $value + $i)
done
echo $value
)
value="Save me!"
func_add 3 4 5
echo "Value: $value"
12
Value: Save me!
The func_add
function stomps on the variable value
, but only in its subshell. The code outside the subshell does not stomp on any variables, so it can be called safely. If you need to modify the parent shell's environment, you can use braces for the function body, then use a subshell within the function's body. You can use command substitution to get information out of the subshell, as in this nearly equivalent example:
func_add() {
add_result=$(
value=0
for i
do
value=$(expr $value + $i)
done
echo $value
)
}
value="Save me!"
add_result="Overwrite me!"
func_add 3 4 5
echo $add_result
echo "Value: $value"
12
Value: Save me!
The variable value
is preserved, as it is modified only in the subshell. However, the add_result
variable is given a new value. You could execute other shell code from the subshell, too; it is not limited to variable assignments. This technique allows you to distinguish between "local" variables in a function and shell globals. However, it has two key limitations. The first is that it really requires nested command substitution (at least in the case where the function's core behavior involves the output of external commands). This restricts portability to relatively modern shells. The other is closely related; this technique uses a couple of subshells, and as such, may perform poorly on Windows machines.
The behavior of temporary assignments made on the command line is not quite portable when the command is a function; in pdksh
, such assignments are not reversed after function execution unless the function was declared using an alternative syntax (discussed in Chapter 7). To pass data to a function without altering the caller's environment or context, pass the data in as arguments and access them using the positional parameters ($1
, $2
, etc.) in the function. (Do not assume that $0
refers either to the function's name or the script's previous value for $0
; it might be either.)
Although they have their limitations, shell functions are exceptionally useful in developing larger shell programs. Functions offer a quick way to bundle up frequently used code and reuse it, generally without the expense of spawning subshells. Many users are unaware of the availability of shell functions or assume they are an extension. While many function features are extensions (and no two shells offer quite the same set of features), functions themselves are essentially universal.
The eval
command executes its arguments as shell code. It may seem odd to need a special command for this; if you have code you wish to execute, why not just write it? There are two factors that make eval
necessary. The first is code that is being generated in some way, usually through parameter substitution or command substitution. Because the results of substitution can never be keywords or other shell syntax features, such as variable assignments, anything that generates code needs to be parsed again by the shell. However, that could easily be handled by feeding the resulting code to another shell. The second factor is the desire to execute that code within the current shell. This is most obvious with variable assignments, although in some cases it is simply a matter of efficiency.
The eval
command takes its arguments and concatenates them (separated by spaces) into a string that is then parsed and executed. Because the arguments to eval
often include bits of shell syntax or metacharacters, many programmers habitually pass a single-quoted string as an argument. The quotes are not always necessary, but it can be a good habit to include them when the arguments are complicated or contain metacharacters so that you can be sure whether it is the calling environment or the eval
command performing any splitting, substitutions, or globbing.
One usage of eval
is to create a shell syntax item, such as a variable assignment, by assembling it from other components (such as an escaped dollar sign and a name). Since there are no arrays in standard shells, programmers sometimes use sequences of variable names to similar effect. For instance, instead of using an array named a
, you might use a series of variables named a_0
, a_1
, and so forth. If the variable count
holds the value of an item of the array, you can assign a value to that member like this:
eval "a_${count}=$value"
This does not work if $value
contains spaces or other special shell characters. The first step in correcting this is to use quotes:
eval "a_${count}="$value""
This works unless $value
contains double quotes or dollar signs. The trick is to prevent the shell from expanding $value
until it is inside the eval
so that it only gets expanded once. To solve this, escape the dollar sign so the shell passes the dollar sign and variable name to eval
rather than the substituted string:
eval "a_${count}="$value""
Now, no matter what string $value
contains, the eval
command executes the following code (assuming $count
was 0
):
a_0="$value"
Nothing generated by parameter substitution is a special syntax character; no matter what $value
contains, the result of the substitution is a plain string inside double quotes, and the contents of the string are reliably stored in $a_0
. In fact, you can go a little further. The shell does not perform field splitting on the right-hand side of an assignment, so you can omit the inner quotes now that the dollar sign is escaped:
eval "a_${count}=$value"
Even the outer quotes are actually unneeded. There is only one argument, and it contains no spaces or special characters that require additional protection:
eval a_${count}=$value
The following fragment stores a collection of file names in a series of named variables, which can later be used somewhat like an array:
count=0
for file in *; do
eval a_${count}=$file
count=`expr $count + 1`
done
On the first iteration, the shell assigns the name of the first file to a_0
. This can only be done using eval
. If you used a second shell, it would not affect variables in the parent shell, and if you didn't use eval
, the shell would fail because there is no command named a_0=file
. On the second iteration (assuming there are multiple files), $count
is 1, so the second file is assigned to the variable a_1
. This allows you to store the results of a glob separately and access them individually later. This gives you a safe way to treat a list of results as an array.
Most shell programs use a simpler idiom, simply accumulating values within a single variable:
for file in *; do
a="$a $file"
done
While this is common and idiomatic, it is not quite as reliable. There is no way after this has run to distinguish between a file name containing spaces and two separate file names. You could use a different idiom, using other characters (such as colons) as separators, but any character can exist in a path name. In the special case where you are looking only at file names guaranteed not to have directory components in them, you could use path separators safely.
The eval
command is also needed to extract these variables. The shell cannot handle nested substitutions like ${a_${count}}
. Some languages, like Perl, can. For the shell, you must use eval
. You can use the same kind of expression used to create dynamically named variables to access them later:
eval value=$a_${count}
The shell generates the string value=$a_0
, then evaluates it. The contents of $a_0
are substituted and stored in $value
. Again, the right-hand side of the assignment is not subject to field splitting, so there is no need for quotes.
The following function provides a moderately complete implementation of arrays using a shell function interface:
func_array () {
func_array_a=$1
func_array_i=$2
case $# in
2)
eval func_array_v=$func_array_${func_array_a}_${func_array_i}
return 0
;;
3)
func_array_v=$3
eval func_array_${func_array_a}_${func_array_i}=$func_array_v
return 0
;;
*)
echo >&2 "Usage: func_array name index [value]"
func_array_v=''
return 1
;;
esac
}
This function can be called to either set a named variable or extract its value. The values are all stored in variables using the prefix func_array_
to avoid name clashes. If you call func_array a 1 hello
, this function stores the string hello
in a variable named func_array_a_1
. If you call this as func_array a 1
, it then stores the current value of $func_array_a_1
in $func_array_v
. You could easily change this to generate an error message for access to an unset array member; as is, it honors the shell's normal convention of substituting an empty string for an unset variable. Note that the index need not be numeric; it can be any string consisting only of underscores, letters, and numbers. This function could do with more error checking for valid indexes, but it illustrates the flexibility of eval
.
Another use of eval
would be displaying commands before running them for debugging or feedback purposes. The following function runs its arguments using eval
, after optionally displaying them:
func_do () {
cmd=$*
if $verbose; then
printf 'running %s
' "$cmd"
fi
eval "$cmd"
}
The printf
command displays the command prior to any substitutions or globbing, which is usually the most informative choice. Printing out the results after substitutions is quite a bit harder; there is a working example of how to do this embedded in libtool
. In essence, you do it by using other tools (such as sed
) to generate multiple versions of the text to be used in different contexts (for instance, inside and outside of double quotes). If you need to do this, pick up the existing code rather than trying to reinvent it, as there are a number of special cases to deal with.
In the section "Introducing Redirection" in Chapter 3, I pointed out that you cannot write code that tries to pick streams to redirect out of a variable. For instance, this code doesn't work:
logfd=3
exec $logfd>/tmp/log.txt
This fails, because the 3 which replaces $logfd
is not seen as part of a redirection; instead, the shell looks for a command named 3, which it can execute with standard ouptut directed into /tmp/log.txt
. The eval
command makes this possible, however:
logfd=3
eval "exec $logfd>/tmp/log.txt"
echo "hello" >&$logfd
This example echoes hello
into /tmp/log.txt.
The shell substitutes $logfd
, producing the string exec 3>/tmp/log.txt
, then eval
executes that string in the current shell environment.
The string passed to eval
must be syntactically correct, or the shell reports a syntax error. The following fragment is just an elaborate syntax error:
eval "if $condition; then $action; "
fi
The eval
statement fails because the if
statement is incomplete; the following fi
is a syntax error because it does not occur at the end of an if
statement. You can use control structures within eval
, but the entire control structure has to be within the code evaluated. By contrast, break
and continue
statements can be executed from within eval
; the break
statement is not a part of the syntax of the enclosing loop, but a command that affects the shell's flow control.
If the code passed to eval
is syntactically valid, the return status of eval
is the return status of the evaluated code. Otherwise, eval
indicates failure (and displays an error message on standard error).
In modern shells, the eval
command can be used to make a temporary assignment to $IFS:
IFS=: eval echo $PATH
The eval
command is run with $IFS
changed, so when it substitutes $PATH
, the shell uses the temporary value (a colon) for field splitting. Temporary variable assignments preceding built-in commands are not reverted with most built-in commands, but POSIX shells do this for eval
or set
. Unfortunately, some older shells do not handle this as expected.
Another common usage for eval
is to run shell code (nearly always assignments) generated by other programs. Programs that want to generate modifications to the shell environment, such as the tset
utility (which manipulates terminal settings), often have a mode in which they emit a series of shell commands. These commands are designed to be incorporated into the shell environment using command substitution and eval
. For instance, the tset
utility can produce shell assignments as output, intended to be evaluated by the calling shell:
eval `tset -s`
This displays basic terminal setup commands to ensure that other settings (such as those controlled by stty
) are synchronized with the terminal type. (Many users also use the -Q
option to prevent tset
from overriding the choice of character used to erase the previous character typed, as this is typically idiosyncratic.) The tset
utility also makes an interesting use of standard error; in its normal usage, it sends terminal reset instructions to standard error once it has identified a terminal type. If you have inadvertently displayed binary data to a terminal, and the terminal is displaying characters incorrectly, running tset
will often correct this. The standard error stream is used so that, even when standard output is being directed to the shell (for command substitution), the special reset sequences go to the terminal anyway.
Another good example of a command with shell command output is the widely available ssh-agent
command. The ssh-agent
command provides a uniform way to handle secure shell authentication for a number of programs. When programs are run as children of ssh-agent
, or children of another program (typically a shell) that ssh-agent
started, they can get the information they need to use these authentication features from the environment. What about programs started elsewhere? To resolve this, the ssh-agent
program can produce a series of environment variable assignments on standard output. Thus running eval `ssh-agent -s`
gets variables into the current shell's environment for use by the shell and its children.
In most cases, the code generated by such programs is limited to variable assignments. In the case of ssh-agent
, there is also an echo
command to display additional information:
$ ssh-agent -s
SSH_AUTH_SOCK=/tmp/ssh-00024095aa/agent.24095; export SSH_AUTH_SOCK;
SSH_AGENT_PID=29018; export SSH_AGENT_PID;
echo Agent pid 29018;
Of course, displaying these values to standard output is useless (unless you're writing a book); the agent is now running, but no variables have been set in the calling shell. Programs expecting to be run in this manner tend to emit semicolons after every command to ensure that their output will be usable even if it has been combined into a single line by field splitting.
The purpose of this command is not just to allow programs that use an SSH agent to access it, but also to let you avoid rerunning the agent if you do not need to. For instance, this chunk of (nonportable, sadly) profile code would reuse an existing ssh-agent
process if one existed:
if test -n "$SSH_AGENT_PID" &&
ps x | grep ssh-agent | grep $SSH_AGENT_PID >/dev/null; then
echo "Existing ssh agent: $SSH_AGENT_PID"
else
eval `ssh-agent -s`
fi
The nonportability in the preceding code is the option specified to ps;
there is no universally portable set of options that will display background processes. You could also use kill −0 $SSH_AGENT_PID
to check for a process with the expected pid, but this would not prove that it was an ssh-agent
process.
There are a number of security concerns with running eval
on code generated by external utilities, as there is no way to constrain the code. When running eval
in a production script, always specify the full paths to programs whose output you will be running. Of course, you may not be able to predict those paths; ssh-agent
, for instance, might be in any /usr/bin
, /usr/local/bin
, /opt/gnu/bin
, or any of a number of other common paths, depending on the system. You can search the common or reasonable places; beyond that, you have to make a security policy decision about how much to trust the user.
If you know a fair bit about the output you are expecting, you may be able to perform some sanity checks on it before executing it. In the fairly common case where you are substituting only a small portion of a piece of code, such as the name of a variable, you can check to make sure that the substitution is reasonable before executing it.
The dot (.
) command reads a named file in and executes it in the current shell. (The name "dot" is not the actual name of the command; you cannot invoke dot
at a shell prompt, but people often refer to the command as "the dot command" for clarity in English, where a single period on its own is not a word.) This is often called sourcing the file, and bash
accepts source
as a synonym for .
, although this is not portable to other shells (in fact, it's a csh
feature). The named file is searched for in the current execution path; if you want to execute a program in the current directory, you must specify its path explicitly (unless you have the current directory in your path, in which case you should change that, as it is a very bad idea). Apart from the use of a search path, . file
is generally equivalent to eval "$(cat file)"
.
The .
command is mostly used for setup scripts that configure the shell's environment. For instance, the previous script intending to modify the user's path can be sourced by the shell, in which case it works:
$ cat path.sh
#!/bin/sh
PATH=$PATH:/usr/local/bin
$ echo $PATH
/bin:/usr/bin
$ . ./path.sh
$ echo $PATH
/bin:/usr/bin:/usr/local/bin
Sourcing can also be used in cases similar to those where you would use eval
, but where it is convenient or desirable to create a file to store a series of commands you wish to run in the current shell.
The various ways of spawning subshells have some overlapping functionality, but there are significant differences between them. The primary differentiations between ways of running subshells (or external shells) are whether the code can affect the parent shell's environment, whether the parent shell performs substitution on the code, and whether the child shell performs substitution (see Table 5-1).
Table 5-1. Shells Calling Shells
Shell Type | Affects Caller Context | Parent Substitutes | Child Substitutes |
sh, sh -c |
No | Yes | Yes |
eval, . |
Yes | Yes | Yes |
() |
No | No | Yes |
fn () |
Yes | No | Yes |
``, $() |
No | No | Yes |
When arguments are passed to eval
or to sh -c
, they are plain strings to the parent shell and subject to normal shell substitution rules. However, when the parent shell creates a sub-shell, whether for command substitution or not, the code passed to the subshell is not subject to substitution in the parent. Similarly, the bodies of shell functions are subject to substitution and globbing each time the function is called, not when it is defined. If the body of a function consists of a subshell, it cannot modify the parent shell's context.
A full external shell should be used when you want to run code in a completely separate context and want the shell to parse that code. External shells have the highest cost of any of the shell execution mechanisms, but they give the cleanest behavior least affected by the current shell's context. External shells have a comparatively high cost, however. In most cases, wrapping an eval
in a subshell is an acceptable substitute for launching a command with sh -c
and may perform marginally better.
The external shell's arguments are potentially subject to parameter or command substitution before they are created, but the external shell will not have any local shell variables you have set. Similarly, it will not have any shell functions or other unusual local environment setup. Use an external shell instead of eval
when you want to run a command that might affect the shell's environment or be affected by the shell's context. Similarly, use an external shell instead of .
when you want to run an external shell script in a separate context.
External shells do have one very significant portability weakness: If the standard system shell lacks features, and you've used an execution preamble to get into a more modern shell, sh -c
will probably call back to the old-fashioned shell. Pay attention to which shell you are calling when you use external shells.
Idiomatically, external shells are often used to express self-containment of a command; in many cases, the external shell command could have been run in a subshell quite easily. One other thing external shells can do is completely detach a child process from the parent shell. A job run in the background is still affiliated with the shell that started it. By contrast, a grandchild process can be completely disconnected. Processes designed to run as daemons often do this internally, but some lightweight programs expect the caller to do it. So one use of an external shell is to start a completely independent background task:
sh -c "background_task
>/dev/null 2>&1 </dev/null &"
The external shell runs background_task
disconnected from all the standard streams, then exits. After the external shell has exited, background_task
is not connected to the parent shell in any way. Redirection of the standard streams is important; otherwise, background_task
might still have the parent shell's input or output streams open, preventing those streams from closing when the shell exits.
Other typical uses of an external shell might look like this:
sh -c "tar cf - $dir | bizp2 | ssh user
@remote
"bzip2 -dc | tar xf -" &
This copies a directory tree to a remote host (and only works if ssh
has been set up to allow passwordless access to that site).
bash installer $source $target > install.log 2>&1
This runs an external script explicitly. You would not want to use .
to run the script; it might change the shell's context radically. The choice of a specific shell suggests that perhaps the installer script depends on bash extensions.
for shell in ksh ksh93 sh bash; do
$shell test > test.$shell
done
ok=true
echo "---output---"
cat test.sh
echo "------------"
for shell in ksh ksh93 bash; do
diff -u test.$shell test.sh || ok=false
done
$ok && echo "All shells matched!"
This scriptlet would provide a very minimal start on testing whether a test script's behavior is consistent across a small range of shells. In this case, the explicit choice of which shell to execute is very much intentional.
The eval
command is used when you want to assemble a chunk of shell code and evaluate it within the current shell context. You should use eval
instead of an external shell primarily if you need to modify the current shell's context. In some cases, though, the performance advantage of not starting a new shell may be worth it as long as executing code in the current context does not cause problems. The eval
command is useful when the generated code involves shell syntax or when you need to perform another pass of substitution on the results of a parameter or command substitution. If you need to interact with a variable, but which one must be determined dynamically, you need eval
. Likewise, because substitution cannot create shell syntax features, such as control structures, you need eval
to generate control structures.
The .
command is used mostly for existing code rather than dynamically generated code. Larger projects written in shell may use .
to incorporate a set of shared shell commands written as functions. The system startup scripts on several Linux systems, as well as many of the BSDs, use shell functions to provide consistent and reliable implementations of common tasks used in startup scripts. For instance, nearly every startup script on NetBSD starts with the following:
$_rc_subr_loaded . /etc/rc.subr
The rc.subr
script provides a number of function definitions to simplify the development of startup scripts. At the end of the script, the $_rc_subr_loaded
variable is set:
_rc_subr_loaded=:
If the support file has not already been loaded, the line expands to . /etc/rc.subr
and loads the support file. If it has been loaded, the line expands to : . /etc/rc.subr
, which does nothing.
Supporting files like this are useful for a number of reasons. Shell functions are generally faster than external commands. Furthermore, they can modify the environment of the script using them. This makes them essentially new built-in commands that can be written on the fly, allowing a great deal of convenience and flexibility in scripting.
Subshells are often used because the parentheses offer a visually intuitive way to group commands. However, if you do not need any of the additional features of the subshell, using a command list (enclosed in braces and terminated by a semicolon or new line) is typically more efficient.
Any time you find yourself about to save, modify, and then restore part of your shell context or environment, a subshell is probably better. One of the most widely used examples of a subshell is this idiom for copying files:
tar cf - . | (cd target
; tar xpf -)
Unlike the cp
command, this preserves a broad variety of nonstandard files, such as device nodes. If run with root privileges, it also preserves ownership. Users on systems that provide it may prefer the pax
utility, which can perform this operation with a single command. However, the pair of tar
commands lends itself to another common idiom, which cannot be done using only a single command, doing the same thing to or from a remote machine:
ssh user
@remote
'cd source
; tar cf -.' | ( cd target
; tar xpf -)
Whether local or remote, the unpacking operation could be done instead using plain shell compound commands, but then the current directory of the shell would be changed. Using a subshell is more idiomatic. If you are using a remote shell, remember that it cannot expand local shell variables; make sure any variable arguments sent to the remote shell have already been expanded on the local end.
Command substitution is one of the central strengths of the shell, allowing arbitrary new functionality to be added to the shell on the fly. Common uses of command substitution include generation of data and performing operations that the shell does not provide natively. For instance, although some modern shells have built-in arithmetic evaluation, historically shell scripts have used the expr
utility to perform arithmetic, and you should stick with it in code that you expect to ever need to port. For instance, when using getopts
(see Chapter 6; of course, this isn't all that portable either) to parse options, the shell sets a variable $OPTIND
to the index of the first nonoption parameter. To remove the options from the parameter list using shift
, you need to shift one less than that many values off the parameter list:
shift `expr $OPTIND - 1`
There may still be shells that only let you shift one argument at a time (because their shift
command takes no arguments), in which case you must use a loop to accomplish this, but I haven't been able to find one. Similarly, you may want to count files that match a given test:
total=0
for file in *
do
test -d "$file" && total=$(expr $total + 1)
done
echo "$total file(s) are actually directories."
Another very common use of command substitution is modifying strings using the sed
or tr
utilities (with which there are many portability issues; see Chapter 8). For instance, a script that wishes to shout at the user might use tr
to uppercase a message:
func_toupper () {
func_toupper_result=`echo "$@" | tr a-z A-Z`
}
func_toupper "I can't hear you."
printf "%s" "$func_toupper_result"
I CAN'T HEAR YOU.
In this case, of course, it might make sense to remove the command substitution and simply display the output immediately. A common pitfall when using command substitution is to carefully store the output of a command, only to immediately display it. This habit probably reflects the idiom in many languages, where you use a special command to display things; thus if you want to display the result of an operation, you obtain the result and then display it. Be careful about falling into this habit. The example displays the output immediately because its only purpose is to display the output. In a real program, if you were always going to display the output immediately, it might make more sense to write the function to display output rather than returning a result:
func_toupper () {
echo "$@" | tr a-z A-Z
}
Command substitution like this is often used when a given data manipulation exceeds the native capability of the shell to perform pattern operations. Some shells offer substantially more flexible variable manipulations, but the basic pattern remains, and there are always things that external utilities are better at.
It is important to note that the commands in a command substitution do not need to be external programs and do not need to be simple shell commands. You can expand the output of functions, lists, or shell control structures, such as while
loops.
One of the most common idioms with all of the previous is combining them. As you have probably noticed, many of the examples of how to use eval
use it on the results of command substitution. The shell is fundamentally a glue language, and each of these mechanisms is used for a different kind of glue. The following example lumps everything together:
: ${MD5SUM="md5sum"}
find . | while read file; do
test -f "$file" || continue
md5=`"$MD5SUM" < "$file"`
eval assoc=$md5_$md5
if test -z "$assoc"; then
eval md5_$md5=$file
else
printf 'duplicate: "%s" and "%s"
' "$file" "$assoc"
fi
done
A few words of explanation may be in order. This script attempts to identify duplicate files in the current directory, using the MD5 checksum algorithm. (This may not be available on all systems; on some systems, it may be named md5
, or not be installed at all.) The essential loop, on the outside, looks like this:
find . | while read file; do
# DO SOMETHING WITH $file
done
This loop uses the output of the find
command (a list of file names, one to a line) as a list of file names to process. Now, what exactly is happening inside the loop?
test -f "$file" || continue
The first operation is a check that the file is a plain file, as opposed to a directory or a UNIX special file (such as the file system representation of a physical device). The short-circuit operator is a little terse, but expressive. This kind of usage is idiomatic in shell; while it might be dismissed as unwarranted "clever" programming if it were not a common idiom, familiarity makes up the difference. This idiom is very similar to the Perl idiom condition || next
.
md5=`"$MD5SUM" < "$file"`
This line sets a variable, $md5
, to the output of the selected checksum command. The md5
and md5sum
programs I used are verbose when invoked on a named file, but nicely terse when invoked with only an input stream, producing nothing but a 32-character string. This string is a 128-bit number derived from the file contents, which is typically different for any two files that are different. Of course, there are many possible clashes, but in practice the chances of a clash are low (extremely close to one in 2^128, if you can believe that).
eval assoc=$md5_$md5
if test -z "$assoc"; then
eval md5_$md5=$file
else
printf 'duplicate: "%s" and "%s"
' "$file" "$assoc"
fi
This is the actual guts of the script. If you store a list of files and their MD5 checksums, you must search the whole list for each potential clash. This is annoying. In a language that supports associative arrays (also often called hashes), you would probably store each file name as a value with its checksum as a key. In fact, you can do nearly the same thing in the shell using computed variable names. Computed variable names, of course, mean using eval
. Let's have a closer look at the bolded code fragment:
eval assoc=$md5_$md5
This is a useful idiom for obtaining the value of a variable whose name you must compute at runtime. If the MD5 checksum of a file were 12345678
(it wouldn't be; it'd be four times that long, but a short name is more readable), this would expand to the following:
assoc=$md5_12345678
This stores the value of the dynamically selected variable in a variable with a predictable and constant name.
If the MD5 variable had a value, it must have been stored from a previous match, and you have a duplicate; you have identified a match between the new file $file
and the file name now stored in $assoc
. If it has not, you want to stash the name of the current file in that variable:
eval md5_$md5=$file
Because the calling shell does not substitute $file
, you do not need to worry about special characters; the eval
command does the substitution, and the results are guaranteed to be treated as a plain word, not as shell metacharacters, even if the file name contains quotes, new lines, spaces, or other unusual characters.
The use of assignment reduces the number of subshells and command substitutions you might otherwise need. A very common idiom is to use eval
in a command substitution to extract the value of a variable:
assoc=`eval printf %s ""$md5_$md5""`
Direct assignment is quite a bit simpler to use, but be aware of this idiom, as you may see it frequently. The combination of eval
and command substitution merits attention because this combination is fairly common. In general, using eval
and echo
(or printf
) to obtain the output of dynamically generated code is a useful idiom. The eval
command lets you generate a variable name dynamically, printf
lets you display its contents, and command substitution lets you embed those contents into another command or variable assignment. The weakness of this idiom is that command substitution implies field splitting and removal of trailing new lines, so it does not preserve all contents precisely. This may be unavoidable, when the dynamically generated code includes references to external commands.
Chapter 6 goes from fiddly little details of shell syntax into gory details of shell invocation and execution. I explain more about the positional parameters, the meaning and nature of shell options, and the grand unified theory of why the shell is not doing what you expected it to do, as well as some of the debugging tools and techniques that may become necessary if a script is misbehaving.
18.117.93.0