Chapter 4. Glob Patterns And Other Basics

In the last chapter, I showed some simple patterns that allow you to avoid having to specify exactly what you want to wait for. In this chapter, I will describe how to use patterns that you are already probably familiar with from the shell—glob patterns. I will also describe what happens when patterns do not match. I will go over some other basic situations such as how to handle timeouts. Finally I will describe what to do at theends of scripts and processes.

The * Wildcard

Suppose you want to match all of the input and the only thing you know about it is that hi occurs within it. You are not sure if there is more to it, or even if another hi might appear. You just want to get it all. To do this, use the asterisk (*). The asterisk is a wildcard that matches any number of characters. You can write:

expect "hi*"
send "$expect_out(0,string) $expect_out(buffer)"

If the input buffer contained "philosophic “, expect would match the entire buffer. Here is the output from the previous commands:

hilosophic
 philosophic

The pattern hi matched the literal hi while the * matched the string "losophic “. The first p was not matched by anything in the pattern so it shows up in expect_out(buffer) but not in expect_out(0,string).

Earlier I said that * matches any number of characters. More precisely, it tries to match the longest string possible while still allowing the pattern itself to match. With the input buffer of "philosophic “, compare the effects of the following two commands:

expect "hi*"
expect "hi*hi"

In the first one, the * matches losophic . This is the longest possible string that the * can match while still allowing the hi to match hi. In the second expect, the * only matches losop, thereby allowing the second hi to match. If the * matched anything else, the entire pattern would fail to match.

What happens with the following command in which there are two asterisks?

expect "*hi*"

This could conceivably match in two ways corresponding to the two occurrences of “hi” in the string.

 

* matches

hi matches

* matches

possibility (1)

philosop

hi

c

possibility (2)

p

hi

losophic

What actually happens is possibility (1). The first * matches philosop. As before, each * tries to match the longest string possible allowing the total pattern to match, but the *’s are matched from left to right. The leftmost *’s match strings before the rightmost *’s have a chance. While the outcome is the same in this case (that is, the whole pattern matches), I will show cases later where it is necessary to realize that pattern matching proceeds from left to right.

* At The Beginning Of A Pattern Is Rarely Useful

Patterns match at the earliest possible character in a string. In Chapter 3 (p. 74), I showed how the pattern hi matched the first hi in philosophic. However, in the example above, the subpattern hi matched the second hi. Why the difference?

The difference is that hi was preceded by "*“. Since the * is capable of matching anything, the leading * causes the match to start at the beginning of the string. In contrast, the earliest point that the bare hi can match is the first hi. Once that hi has matched, it cannot match anything else—including the second hi.

In practice, a leading * is usually redundant. Most patterns have enough literal letters that there is no choice in how the match occurs. The only remaining difference is that the leading * forces the otherwise unmatched leading characters to be stored in expect_out(0,string). However, the characters will already be stored in expect_out(buffer) so there is little merit on this point alone.[18]

* At The End Of A Pattern Can Be Tricky

When a * appears at the right end of a pattern, it matches everything left in the input buffer (assuming the rest of the pattern matches). This is a useful way of clearing out the entire buffer so that the next expect does not return a mishmash of things that were received previously and things that are brand new.

Sometimes it is even useful to say:

expect *

Here the * matches anything. This is like saying, “I don’t care what’s in the input. Throw it away.” This pattern always matches, even if nothing is there. Remember that * matches anything, and the empty string is anything! As a corollary of this behavior, this command always returns immediately. It never waits for new data to arrive. It does not have to since it matches everything.

In the examples demonstrating * so far, each string was entered by a person who pressed return afterwards. This is typical of most programs, because they run in what is called cooked mode. Cooked mode includes the usual line-editing features such as backspace and delete-previous-word. This is provided by the terminal driver, not the program. This simplifies most programs. They see the line only after you have edited it and pressed return.

Unfortunately, output from processes is not nearly so well behaved. When you watch the output of a program such as ftp or telnet (or cat for that matter), it may seem as if lines appear on your screen as atomic units. But this is not guaranteed. For example, in the previous chapter, I showed that when ftp starts up it looks like this:

% ftp ftp.uu.net
Connected to ftp.uu.net.
220 ftp.UU.NET FTP server (Version 6.34 Thu Oct 22 14:32:01 EDT 1992) ready.
Name (ftp.uu.net:don):

Even though the program may have printed "Connected to ftp.uu.net. " all at once—perhaps by a single printf in a C program—the UNIX kernel can break this into small chunks, spitting out a few characters eachtime to the terminal. For example, it might print out "Conn" and then”ecte" and then "d to" and so on. Fortunately, computers are so fast that humans do not notice the brief pauses in the middle of output. The reason the system breaks up output like this is that programs usually produce characters faster than the terminal driver can display them. The operating system will obligingly wait for the terminal driver to effectively say, “Okay, I’ve displayed that last bunch of characters. Send me a couple more.” In reality, the system does not just sit there and wait. Since it is running many other programs at the same time, the system switches its attention frequently to other programs. Expect itself is one such “other program” in this sense.

When Expect runs, it will immediately ask for all the characters that a program produced only to find something like "Conn“. If told to wait for a string that matches "Name*:“, Expect will keep asking the computer if there is any more output, and it will eventually find the output it is looking for.

As I said, humans are slow and do not notice this chunking effect. In contrast, Expect is so fast that it is almost always waiting. Thus, it sees most output come as chunks rather than whole lines. With this in mind, suppose you wanted to find out the version of ftp that a host is using. By looking back at the output, you can see that it is contained in the greeting line that begins "220" and ends with "ready.“. Naively, you could wait for that line as:

expect "220*"                     ;# dangerous

If you are lucky, you might get the entire line stored in $expect_out(0,string). You might even get the next line in there as well. But more likely, you will only get a fragment, such as "220 f" or "220 ftp.UU.NE“. Since the pattern 220* matches either of these, expect has no reason to wait further and will return. As I stated earlier, expect returns with whatever is the longest string that matches the pattern. The problem here is that the remainder of the line may not have shown up yet!

If you want to get the entire line, you must be more specific. The following pattern works:

"220*ready."

By specifying the text that ends the line, you force expect to wait for the entire line to arrive. The "." is not actually needed just to find the version identifier. You could just make the pattern:

"220*re"

Leaving off the e would be too short. This would allow the pattern to match the r in server rather than ready. It is possible to make the overall pattern even shorter by looking for more unusual patterns. But quite often you trade off readability. There is an art to choosing patterns that are correct, yet not too long but still readable. A good guideline is to give more priority to readability. The pattern matching performed by Expect is very inexpensive.

More Glob Patterns

In all the examples so far using the * wildcard, it has matched an arbitrarily long string of characters. This kind of pattern specification is called shell-style since it is similar to the way filename matching works in the shell. The name of the program which did this matching for the Bourne shell was called glob. Hence such patterns are often called glob-style also. From now on, I will just call them glob patterns.

Tcl’s "string match" command also uses glob patterns. Glob patterns support two other wildcards. They are "?" and "[]“.

? matches any single character. For example, the pattern a?d would match abd but not abcd.

Ranges match any character specified between square brackets. For example, [abcdef0123456789] matches any hexadecimal digit. This pattern can also be expressed as [a-f0-9]. If you want to match a literal hyphen, make it the first or last character. For example, [-a-c] matches "-“, "a“, "b“, or "c“.

Unfortunately, brackets are also special to Tcl. Anything in brackets is evaluated immediately (unless it is deferred). That means that an expect command using a pattern with a range must be written in one of two ways:

expect "[a-f0-9]"     ;# strongly preferred
expect {[a-f0-9]}

In the first case, the backslash () allows the bracket to be passed literally to the expect command, where it is then interpreted as the start of a range. In the second case, the braces force everything inside to be read as their literal equivalents. I prefer the first style—because in the second case, sequences such as and $pat embedded in braces are not processed but are taken as literal character sequences of and n and $ and p and a and t. This is usually not what is intended.

You can prefix the right bracket with a backslash if it makes you feel good, but it is not necessary. Since there is no matching left-hand bracket to be matched within the double-quoted string, nothing special happens with the right-hand bracket. It stands for itself and is passed on to the expect command, where it is then interpreted as the end of the range.

Backslashes

Tcl makes various substitutions when you have backslashes, dollar signs, and brackets in command arguments. You should be familiar with these from Chapter 2 (p. 23). In this section, I am going to focus on backslashes.

Backslash translations are done by Tcl only when processing command arguments. For example, is translated to a linefeed, [ is translated to a "[“, and \ is translated to a "“. Sequences that have no special translation are replaced by the character without the backslash. For example, z is translated to a "z“.

While pattern matching, Expect uses these translated values. For example:

expect "
" ;# matches 
 (linefeed character)
expect "
" ;# matches 
 (return character)
expect "z" ;# matches z  (literal z)
expect "{" ;# matches {  (literal left brace)

If any backslashes remain after Tcl’s translation, the pattern matcher (i.e., pattern matching algorithm) then uses these remaining backslashes to force the following character into its literal equivalent. For example, the string "\*" is translated by Tcl to "*“. The pattern matcher then interprets the "*" as a request to match a literal "*“.

expect "*"    ;# matches * and ? and X and abc
expect "\*"  ;# matches * but not ? or X or abc

Similarly, backslashes prevent a ? from acting like a wildcard.

expect "?"    ;# matches * and ? and X but not abc
expect "\?"  ;# matches ? but not * or X or abc

So that you can see the consistency here, I have written out some more examples. Do not try to memorize these. Just remember two rules:

  1. Tcl translates backslash sequences.

  2. The pattern matcher treats backslashed characters as literals.

These rules are executed in order and only once per command.

For example, in the second command below, Tcl translates the " " to a linefeed. The pattern matcher gets the linefeed and therefore tries to match a linefeed. In the third command, Tcl translates the "\" to "" so that the pattern matches sees the two characters " “. By the second rule above, the pattern matcher interprets this as a literal n. In the fourth command, Tcl translates "\" to "" and " " to a linefeed. By the second rule, the pattern matcher strips off the backslash and matches a literal linefeed.

In summary, is replaced with a linefeed by Tcl but a literal n by the pattern matcher. Any character special to Tcl but not to the pattern matcher behaves similarly.

expect "n" ;# matches n
expect "
" ;# matches 
 (linefeed character)
expect "\n" ;# matches n
expect "\
" ;# matches 

expect "\\n" ;# matches sequence of  and n
expect "\\
" ;# matches sequence of  and 

expect "\\\n" ;# matches sequence of  and n
expect "\\\
" ;# matches sequence of  and 

expect "\\\\n" ;# matches sequence of , , and n

In the next set of examples, * is replaced with a literal * by Tcl and by the pattern matcher. Any character special to the pattern matcher but not Tcl behaves similarly.

expect "*" ;# matches anything
expect "*" ;# matches anything
expect "\*" ;# matches *
expect "\*" ;# matches *
expect "\\*" ;# matches  followed by anything
expect "\\*" ;# matches  followed by anything
expect "\\\*" ;# matches  followed by *

The "[" is special to both Tcl and the pattern matcher so it is particularly messy. To match a literal "[“, you have to backslash once from Tcl and then again so that it is not treated as a range during pattern matching. The first backslash, of course, has to be backslashed to prevent it from turning the next backslash into a literal backslash!

expect "\["   ;# matches literal [

This is quite a headache. In fact, if the rest of the pattern is sufficiently specific, you may prefer to improve readability by just using using a ? and accepting any character rather than explicitly forcing a check for the "[“.

The next set of examples shows the behavior of "[" as a pattern preceded by differing numbers of backslashes. If the "[" is not prefixed by a backslash, Tcl interprets whatever follows as a command. For these examples, imagine that there is a procedure named XY that returns the string "n*“.

expect "[XY]" ;# matches n followed by anything
expect "[XY]" ;# matches X or Y
expect "\[XY]" ;# matches n followed by anything
expect "\[XY]" ;# matches [XY]
expect "\\[XY]" ;# matches  followed by n followed ...
expect "\\[XY]" ;# matches sequence of  and X or Y

The \[XY] case deserves close scrutiny. Tcl interprets the first backslash to mean that the second is a literal character. Tcl then interprets [XY] as the result of the XY command. The pattern matcher ultimately sees the four character string " *w“. The pattern matcher interprets this in the usual way. The backslash indicates that the n is to be matched literally (which it would even without the backslash since the n is not special to the pattern matcher). Then as many characters as possible are matched so that a w can also be matched.

By now, you may be wondering why I write all patterns in double quotes in preference to using braces. It is true that braces shorten some of the patterns I have shown here. However, braces do not allow patterns to be specified from variables, nor do they allow backslashed characters such as newlines. But such patterns occur so frequently that you have to be familiar with using double quotes anyway. Constantly thinking about whether to use braces or double quotes is unproductive. Learn how to use double quotes and do not think further about using braces for patterns. If you know Tcl very well and skippedChapter 2 (p. 23), it may be helpful for you to now go back and read the beginning of it as well as the discussion of eval on page 55.

Handling Timeout

Much of the time, expect commands have only one argument—a pattern with no action—similar to the very first one in this chapter:

expect "hi"

All this does is wait for hi before continuing. You could also write this as:

expect "hi" {}

to show the empty action, but expect does not require it. Only the last action in an expect command can be omitted:

expect  {
    "hi"            {send "You said hi
"}
    "hello"            {send "Hello yourself
"}
    "bye"
}

As a natural consequence of this, it is typical to write expect commands with the exception strings at the top and the likely string at the bottom. For example, you could add some error checking to the beginning of the anonymous ftp script from the previous chapter:

spawn ftp $argv
set timeout 10
expect {
    "connection refused" exit
    "unknown host" exit
    "Name"
}
send "anonymous
"

If the script sees Name it will go on and send anonymous . But if it sees "unknown host" or "connection refused“, the script will exit. Scripts written this way flow gracefully from top to bottom.

If, after 10 seconds, none of these patterns have been seen, expect will timeout and the next command in the script will be executed. I used this behavior in constructing the timed_read script in the previous chapter. Here, however, I only want to go to the next command if Name is successfully matched.

You can distinguish the successful case from the timeout by associating an action with the timeout. This is done by using the special pattern timeout. It looks like this:

expect {
    timeout {puts "timed out"; exit}
    "connection refused" exit
    "unknown host" exit
    "Name"
}

If none of the patterns match after ten seconds, the script will print "timed out" and exit. The result is that the script is more robust. It will only go on if it has been prompted to. And it cannot hang forever. You control how long it waits.

Although the timeout pattern is invaluable, it is not a replacement for all error handling. It is tempting to remove the patterns "connection refused" and "unknown host“:

expect {
    timeout exit
    "Name"
}

Now suppose "unknown host" is seen. It does not match Name and nothing else arrives within the ten seconds. At the end of ten seconds, the command times out. While the script still works, it fails very slowly.

This is a common dilemma. By explicitly specifying all the possible errors, a script can handle them more quickly. But that takes work on your part while writing the script. And sometimes it is impossible to find out all the error messages that a program could produce.

In practice, it suffices to catch the common errors, and let timeout handle the obscure conditions. It is often possible to find a pattern with appropriate wildcards that match many errors. For example, once ftp is connected, it is always possible to distinguish errors. ftp prefaces all output with a three-digit number. If it begins with a 4 or 5, it is an error. Assuming ftp’s line is the only thing in expect’s input buffer, you can match errors using the range construct described on page 91:

expect {
    timeout {unexpected ...}
    "^[45]" {error ...}
    "ftp>"
}

As I described in Chapter 3 (p. 73), the ^ serves to anchor the 4 or 5 to the beginning of the buffer. If there are previous lines in the buffer—as is more likely—you can use the pattern " [45]“. The linefeed ( ) matches the end of the carriage-return linefeed combination that appears at the end of any line intended to be output on a terminal.

When the timeout pattern is matched, the data that has arrived is not moved to expect_out(buffer). (In Chapter 11 (p. 248), I will describe the rationale for this behavior.) If you need the data, you must match it with a pattern. You can use the * wildcard to do so:

expect *

As I noted earlier, this command is guaranteed to return immediately, and expect_out(buffer) will contain what had arrived when the previous timeout occurred.

By convention, the timeout pattern itself is not quoted. This serves as a reminder to the reader that expect is not waiting for the literal string "timeout“. Putting quotes around it does not change expect’s treatment of it. It is still be interpreted as a special pattern. Quotes only protect strings from being broken up, such as by spaces. For that reason, you can actually write a subset of expect patterns without any quotes. Look at the following intentionally obfuscated examples:

expect "hi" there
expect  hi  there
expect "hi  there"

In the first and second commands, hi is the pattern, while "hi there" is the pattern in the third command. For consistency, use quotes around all textual patterns, and leave them off the special pattern timeout. In Chapter 5 (p. 109), I will show how to wait for the literal string timeout.

Here is another example of the timeout pattern. You can use the ping command to test whether a host is up or not. Assume that host elvis is up and houdini is down. Not all versions of ping produce the same output, but here is how it looks when I run it:

% ping elvis
elvis is alive
% ping houdini
no answer from houdini

What ping actually does is to send a message to the host which the host should acknowledge. ping usually reports very quickly that the host is up, but it only says "no answer" after waiting quite a while—20 seconds is common.

If the host is on your local network, chances are that if the host does not respond within a second or two, it is not going to respond. If you are only looking for a host to farm out some background task, this heuristic works well. Realistically, it is exactly the same heuristic that ping uses—just a little less tolerant. Here is an Expect script that forces ping to respond after 2 seconds.

spawn ping $host
set timeout 2
expect "alive" {exit 0} timeout {exit 1}

If the expect sees alive within two seconds, it returns 0 to the caller; otherwise it returns 1. When called from a /bin/sh script, you find the result by inspecting the status. This is stored in the shell variable $? (or $status in csh).

$ echo $?
0

Strictly speaking, the status must be an integer. This is good in many cases—integers are easier than strings to check anyway. However, it is possible to get the effect of returning a string simply by printing it out. Consider the following commands which print out the same messages as ping:

spawn ping $host
set timeout 2
expect "alive" {exit 0} timeout {
    puts "no answer from $host"
    exit 1
}

The timeout action prints the string "no answer from ..." because the script will abort ping before it gets a chance to print its own error message. The alive action does not have to do anything extra because ping already prints the string. Both strings are sent to the standard output. In Chapter 7 (p. 171), you will see how to prevent printing strings from the underlying process, and even substitute your own if desired.

Some versions of ping have a user-settable timeout. But the technique I have shown is still useful. Many other programs are completely inflexible, having long fixed timeouts or none at all.

rsh is a program for executing shell commands remotely.[19]rsh is an example of a program that is very inflexible when it comes to timeouts. rsh waits for 75 seconds before deciding that a machine is down. And there is no way to change this time period. If rsh finds the machine is up, it will then execute the command but without any ability to timeout at all. It would be nice if rsh and other commands all had the ability to timeout, but it is not necessary since you can achieve the same result with an Expect script.

Rather than writing separate scripts to control rsh and every other problem utility, you can write a parameterized script to timeout any program. The two parameters of interest are the program name and the timeout period. These can be passed as the first and second arguments. Assuming the script is called maxtime, it could be used from the shell to run a program prog for at most 20 seconds with the following:

% maxtime 20 prog

Here is the script:

#!/usr/local/bin/expect --
set timeout [lindex $argv 0]
spawn [lindex $argv 1]
expect

The script starts by setting the timeout from the first argument. Then the program named by the second argument is spawned. Finally, expect waits for output. Since there are no patterns specified, expect never matches using any of the output. And because there are no patterns to match, after enough time, expect times out. Because there is no timeout action, expect simply returns, and the script ends. Alternatively, if the program ends before the timeout, expect notices this and returns immediately. Again, the script ends.

Handling End Of File (eof)

In the previous example, the expect command waited for output for a specific period of time. If the program terminates, there can be no more output forthcoming. expect recognizes this. Specifically, expect recognizes the closing of the connection to the spawned process. This closing is referred to as end of file or more succinctly, eof.[20]

While it is not a rule, usually a process closes the connection just prior to exiting. By default, the expect command simply returns when it sees an eof (i.e., closing). In light of this, it is worth reviewing the maxtime script.

After the maxtime script spawned a process, expect waited. Since there were no patterns, the output could not match. If the process continued running up to the timeout period, expect would return and the script would return. If the process stopped running before the timeout period, the process would first close the connection. expect would see this as an eof. Again, expect would return and then the script would return.

Similarly to the way an action is associated with a timeout, it is possible to associate an action with an eof. The special pattern eof is used. For example, the maxtime script could use this to report whether the spawned program completed within the allotted time or ran over.

#!/usr/local/bin/expect --
set timeout [lindex $argv 0]
eval spawn [lrange $argv 1 end]
expect {
    timeout {puts "took too much time"}
    eof     {puts "finished in time"}
}

Here are some test cases called from the shell using the UNIX sleep command. The sleep command is the perfect program to test with since it waits for exactly the amount of time you request.

% maxtime 2 sleep 5
spawn sleep 5
took too much time
% maxtime 5 sleep 2
spawn sleep 2
finished in time

In the first case, sleeping for five seconds took longer than two, so the script reported that it "took too much time“. In the second case, sleeping for two seconds is easily accomplished in five seconds, so the script said "finished in time“.

Hints On The spawn Command

I made one other change to the script that is worth noting. The first script only accepted a single argument as a program name. But this new version of maxtime understands that additional arguments after the program name are arguments to the program. This is accomplished with the command:

eval spawn [lrange $argv 1 end]

The lrange extracts all but the first argument (the timeout) and returns a list where the first element is the program name and the remaining elements are the arguments to the program. Assuming lrange produces "sleep 5“, eval joins that to spawn ending up with:

spawn sleep 5

eval executes this as if it were the original command line. Compare the eval command with the following:

spawn [lrange $argv 1 end]     ;# WRONG!

In this case, spawn takes the result of lrange as a program name and tries to run that program. Again, the result of lrange is "sleep 5“, and this entire string is then used as the program name. Needless to say, there is no program by the name "sleep 5“.

It is worth remembering the command "eval spawn [lrange $argv ... end]“. It is handy for writing scripts that allow optional command-line arguments to be passed in to become the arguments to the spawned process. This command or a variation of it appears in many of the later examples in this book.

Here is the production version of the maxtime script:

#!/usr/local/bin/expect --
set timeout [lindex $argv 0]
eval spawn [lrange $argv 1 end]
expect

One other precautionary note about spawn should be observed for now. Do not use spawn from within a procedure. Just call spawn from outside procedures. In scripts that only run a single process, this is an easy guideline to follow. In Chapter 10 (p. 236), you will learn more about spawn and at that point, the restriction will be lifted.

Back To Eof

In the ping script on page 96, there was no specific handling of the eof. Here is that script again:

spawn ping $host
set timeout 2
expect "alive" {exit 0} timeout {exit 1}

If expect sees an eof, then ping terminates within the timeout but without producing output containing "alive“. How is this possible? After all, a host is either up or incommunicado. In fact, there is a third case. ping also reports if the host does not exist— that is, if there is no computer with such a name. In this case, ping says "unknown host“, closes the connection, and exits. expect sees an eof, but since there is no eof pattern and corresponding action, theexpect command returns. There are no more commands so the script ends.

When the script ends by running out of commands, an implied "exit 0" is executed. This is typical for interpreters, and UNIX commands conventionally return 0 to indicate that a command is successful. But in this case, the script returns 0 when given a non-existent host. This is clearly the wrong behavior. Unfortunately, the right behavior is not as clear. You could return 1 and revise the definition of what that means from “failure due to timeout” to simply “failure”. Or you could choose a different number, say, 2. Either can be justified depending on the use to which you want to put the script. ping returns 1 when the host is unknown so I will follow suit. Here is the revised script to handle the eof:

spawn ping $host
set timeout 2
expect "alive" {exit 0} timeout {exit 1} eof {exit 1}

In some ways this still does not handle the problem perfectly. For example, without looking directly at the source to ping, I do not know if there are other ways it could behave. For now, I am just lumping everything I do not know into an error.

But this may be sufficient. Indeed, one of the reasons for using Expect is that you may not be able to see the source in the first place. So taking a conservative approach of calling everything that is not expected an error, is a practical and common solution.

Timeout and eof are the only types of exception conditions possible. As in the ping example, both exceptions often deserve the same type of handling. For this reason, there is a special pattern called default that represents both conditions. The last line of the ping script could be rewritten to use default as:

expect "alive" {exit 0} default {exit 1}

Using default (or both timeout and eof) covers all possible conditions that an expect command can match. It is a good idea to account for all conditions in every expect command. This may seem like a lot of work, but it can pay off handsomely during debugging. In Chapter 11 (p. 255), I will describe how to use the expect_before and expect_after commands to catch all timeouts and eofs without specifying them on each expect. Those commands can greatly simplify your scripts.

The close Command

When a spawned process closes its connection to Expect, the expect command sees an eof.

Figure 4-1. 

This scenario can also occur in the reverse direction. Expect can close the connection and the spawned process will see an eof.

Figure 4-2. 

By closing the connection, Expect is telling the spawned process that it has nothing more to say to the process. Usually the process takes this as an indication to exit. This is similar to what occurs when you press ^D while manually interacting with a process. The process does not see the ^D. Rather, the system turns this into an eof. The process reads the eof and then responds by closing the connection and exiting.

There is one difference between how Expect and the spawned process treat a closed connection. When Expect closes the connection, the spawned process sees an additional indication in the form of a hangup signal. Most processes take this as an instruction to immediately exit. The net result is very similar to reading an eof. In either case, the process exits. Later in the book, I will go into more detail about what signals are and how you can ignore them or take advantage of them.

From Expect, the command to close the connection to a process is close. It is called as:

close

No matter which side—the Expect process or the spawned process—closes the connection first, the other side must also close the connection. That is, if the spawned process first closes the connection, then the Expect process must call close. And if the Expect process first calls close, the spawned process must then call close.

Fortunately, in many scripts it is not necessary to explicitly close the connection because it can occur implicitly. There are two situations when you do not have to use close:

  • when the Expect process ends, or

  • when the expect command reads an eof from the spawned process.

In both of these cases, Expect closes the connection for you. This effectively means that the only time you need to explicitly write close is when you want to close the connection before the spawned process is ready to and you are not ready to end the entire Expect script.

In all the examples so far it has not been necessary to explicitly close the connection. Either expect read an eof or the script exited, thereby sending an eof to the spawned process, which in turn closed its end of the connection. It is not necessary to wait for an eof after you have already closed the connection. Indeed, it is not even possible. When the connection is closed, you cannot read anything—data or eof. The connection no longer exists.

Here is an example of why you might want to call close explicitly. Imagine you are interacting with ftp. If you have an "ftp>" prompt, you can send the command quit and ftp will immediately exit, closing the connection from its end. But suppose ftp is in the middle of transferring a file and you need to close the connection immediately. You could interrupt ftp, wait for it to prompt, and then send the quit command. But it is simpler to just close the connection. ftp will abort the transfer and quit.

This may seem like a fairly rude way of doing things. After all, you do not have to abruptly close connections like this. You can always work through whatever scenario a program wants for it to initiate the close on its side. But it is important to understand this technique in order to handle things such as when you kill an Expect script, for example, by pressing ^C.

By default, ^C causes Expect to exit (i.e., "exit 0“). This in turn will close the connection to the spawned process, and the spawned process will die. If you want the spawned process to continue on after the Expect script exits, you have to take make special arrangements. I will describe more about this later.

Programs That Ignore Eof

There is an exception to the scenario that I just described. Some interactive programs are rather cavalier when they encounter an eof and do not handle it correctly. However, if you are prepared for this situation, you can work around it easily enough. There are two kinds of common misbehavior:

  • Some programs ignore eof.

  • Some programs ignore data just before eof.

I will discuss the two cases separately.

Some programs ignore eof. Even if you close the connection (by calling close, exiting the script, or pressing ^C), they ignore the eof and continue waiting for more characters to arrive. This is characteristic of the ubiquitous telnet implementation and many other programs that run in raw mode. Raw mode means that no special interpretations are applied to input characters. For instance, ^C no longer serves as an interrupt, and ^D no longer acts as an eof. Since users cannot send an eof, these programs have no reason to expect it and thus do not look for it. The problem is, an eof is exactly what they get when the connection is closed.

Avoid explicitly closing programs like these before they are ready. Instead, force them to close the connection in the way they would when using them manually. For instance, telnet will close the connection on its own once you log out of the remote host. If you do not gracefully log out, thereby letting telnet shut down the connection, you will be left with a telnet process on your system talking to no one. Such a process must then be killed by hand using the UNIX kill command. (It is possible to do this from Expect, but I will not go into it until Chapter 13 (p. 292).)

Some programs detect eof but ignore any other data that comes along with it. An example is the following Expect script which runs ftp. Three files are requested but after the script has finished, only two of the files are found.

spawn ftp . . .
# assume username and password are accepted here
expect "ftp> " {send "get file1
"}
expect "ftp> " {send "get file2
"}
expect "ftp> " {send "get file3
"}

After sending "get file3 “, Expect immediately closes the connection to ftp and exits. Then ftp reads the command but also finds the eof as well. Unlike telnet in the previous example, ftp checks for the eof but it mistakenly assumes that the eof also means there is no data to process. It simply does not check and therefore the "get file3 " is never done.

In this example, the solution is to add a final expect command to wait for another prompt. An even simpler example is the following script which starts the vi editor and sends a command. The command inserts "foo" into a file which is then saved. The "q" tells vi to quit.

spawn vi file
send "ifoo33:wq
"

Because of the final quit command, there is no prompt for which to wait. Instead, it suffices to wait for an eof from vi itself. And since the eof has no action, the eof keyword can be omitted as well. Here is the corrected script:

spawn vi file
send "ifoo33:wq
"
expect

Spawned processes that exit on the hangup signal behave similarly to programs that ignore data just before an eof. The solution is the same. Wait for the spawned process itself to close the connection first.

The wait Command

After closing the connection, a spawned process can finish up and exit. Processes exit similarly to the way Expect scripts do, with a number (for example, "exit 0“). The operating system conveniently saves this number and some other information about how the process died. This information is very useful for non-interactive commands but useless for interactive commands. Consequently, it is of little value to Expect. Nonetheless, Expect must deal with it.

Expect must retrieve this information—even if only to discard it. The act of retrieving the information frees various valuable resources (process slots) within the computer. Until the information is retrieved, the operating system maintains the information indefinitely. This can be seen from the output of ps. Assuming a spawned process has died and the connection has been closed, ps shows something like this:

PID   TT  STAT  TIME  COMMAND
4425  ?   Z     0:00  <defunct>

The Z stands for zombie—someone’s attempt to humorously describe a process that is dead but still haunts the system in an almost useless way. Even the process name and arguments have been discarded—no matter what they were originally, they show up here as <defunct>.

To get rid of this zombie, use the wait command. It is called simply as:

wait

The wait command returns a list of elements including the spawn id and process id. These elements are further described in Chapter 14 (p. 309). For now, ignore the return value of wait.

Because a process will not disappear from the system until you give the wait command, it is common to speak of waiting for or waiting on a process. Some people also like to use the term reap as in “reaping a process”.

Because wait follows close, it is very common to see people write "close;wait" on a single line. But if the connection is closed implicitly, the wait must appear by itself. Like close, the wait command can also occur implicitly. Unlike close, however, wait implicitly happens in only one case—when an Expect process (i.e., script) exits. On exit, all the spawned processes are waited for.

This means that Expect scripts that only spawn a single process and then exit, need not call wait since it will be done automatically. The example scripts so far have all taken advantage of this. Later on, I will show a script in which it is important to explicitly wait.

One last thing about wait: If you call it before a process has died, your Expect script will wait for the process to die—hence the name. It is possible to avoid the delay by using the -nowait flag.

wait -nowait

Exercises

  1. Write a pattern to match hexadecimal numbers. Write a pattern to match Roman numbers.

  2. Write a pattern to match the literal string timeout. Write a pattern to match the literal string "timeout" (with the double quotes).

  3. Write a script that takes a string and produces a pattern which will match the string. Make the script prompt for the string to avoid any interpretation of it by the shell.

  4. On page 101, I described what happens if the spawned process closes the connection first and what happens if the script closes the connection first. What happens if both the script and the spawned process close the connection simultaneously?

  5. Write a script that automatically retrieves the latest release of Expect and installs it. In what ways can you generalize the script so that it can retrieve and install other software?



[18] The more likely reason to see scripts that begin many patterns with "*" is that prior to Expect version 4, all patterns were anchored, with the consequence that most patterns required a leading "*“.

[19] Some systems call it remsh.

[20] The terminology comes straight from UNIX, where all output sources can be viewed as files, including devices and processes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.183.150