Chapter 6. Patterns, Actions, And Limits

In this chapter, I will describe the limits of patterns, including what to do when you hit them. I will also cover the darker side of range patterns—matching anything but certain characters. In the second half of the chapter, I will go into more detail on pattern actions including flow control. Finally, I will cover some miscellaneous pattern matching issues that do not fit anywhere else.

Matching Anything But

As I said in Chapter 5 (p. 113), pattern pieces match as many characters as possible. This makes it a little tricky to match a single line, single word, or single anything. For example, the regular expression ".* " matches a single line, but it also matches two lines because two lines end with a " “. Similarly, it matches three lines, four lines, and so on. If you want to read lines one at a time from another program, then you cannot use this kind of pattern. The solution is to use the "^“.

In Chapter 3 (p. 73), I showed that the "^" matches the beginning of the input buffer. When ^ is the first character of a regular-expression range, it means match anything but the given characters. For example, the regular expression [^ab] matches any character except a or b. The pattern [^a-zA-Z] matches any character but a letter.[26]

A range can be used to build larger patterns. The pattern "[^ ]*" matches the longest string not including a blank. For example, if the input buffer contained "For example, if the input buffer contained“, the following expect command could be called repeatedly to match each word in the input.

expect -re "([^ ]*) "

The range matches each word and the result is stored in $expect_out(1,string). The space at the end of the word is matched explicitly. Without the explicit space, the input buffer is left beginning with a space (”cow jumped ...“) and subsequent matches return the null string before the first space.

Remember that the length of the match is important, but only after the starting position is taken into account. Patterns match the longest string at the first possible position in the input. In this example, 0 characters at column 0 are successfully matched even though the pattern can also match 3 characters at column 1. Because column 0 is before column 1, the earlier match is used.

There is no explicit means to match later matches than earlier ones, but often it is possible to simply pick a more descriptive pattern. In this example, the space can be skipped over. Alternatively, the * can be replaced by a + to force the pattern to be at least one letter. This effectively skips over the space between the words without the need to explicitly match it.

expect -re "[^ ]+"

Now the word is stored in "expect_out(0,string)“. Because the pattern does not match whitespace, there is no need to select pieces of it, and the parentheses are no longer needed, simplifying the pattern further.

Here is the opening dialogue greeting from Uunet’s SMTP server. SMTP is the mail protocol used by most Internet computers. The server is normally controlled by a mail program to transfer mail from one host to another, but you can telnet to it directly and type commands interactively. The telnet program adds the first three lines, and Uunet sends back the line that begins "220“:

% telnet relay1.uu.net smtp
Trying 192.48.96.5 ...
Connected to relay1.uu.net.
Escape character is `^]'.
220 relay1.UU.NET Sendmail 5.61/UUNET-internet-primary ready at Mon, 22 Feb 93
23:13:56 -0500

In the last line (which wraps over two physical lines on the page), the remote hostname appears immediately after the "220“. In order to match and extract the hostname, use the following command:

expect -re "
220 ([^ ]+) "

There are several subtle things about this command. First of all, the SMTP protocol dictates that responses are terminated by and that the initial response to the connection begins with the string 220 followed by the host or domain identification. Thus, you are guaranteed to see the string "220“.

Unfortunately, the telnet program prints out the IP address of the remote host name in its "Trying ..." message. Since it is quite possible for part of the IP address to actually be "220“, the pattern starts with to match the end of the previous line, effectively forcing the 220 to be the first thing on its own line. A space is skipped and then the familiar "[^ ]+" pattern matches the hostname.

Unlike the previous example, yet another space follows the "[^ ]+" pattern. Since the pattern explicitly forces the hostname to be non-null, why is space needed at the end of the name? As I described in Chapter 4 (p. 89), network or other delays might crop up at any time. For example, if the greeting line had only partially printed by the time the pattern matching had begun, the input buffer might contain just "220 rela“. Without the explicit space after the hostname, the pattern would match "rela“. With the explicit space, the pattern will match the full "relay1.UU.NET“.

Matching the hostname from the SMTP dialogue is not an artificial example. This technique can be used to convert IP addresses to hostnames when the IP-to-hostname directory entries do not exist, a common fault in many domains. In practice, the likelihood of a host running an SMTP server is much higher than the likelihood that its domain name server is correctly configured with complete reverse mappings. The gethostbyname example script that comes with the Expect distribution resorts to this and a number of other techniques to convert host addresses to names.

The ability to automate telnet opens up worlds of possibilities. All sorts of useful data can be collected and manipulated through interfaces that were originally designed only for humans.

Much of this information is rather open-ended. There may be no standards describing it other than a particular implementation. However, by studying sufficient output, you can usually come up with Expect scripts to read it back in. And if you cannot write an Expect script to understand a program’s output, chances are that humans cannot understand the output to begin with.

Really Complex Patterns

Writing scripts to understand natural language is not particularly difficult, but Expect does not give any particular assistance for the task. Regular expressions by themselves are certainly not sufficient to describe arbitrarily complex patterns. In some situations, it is even reasonable to avoid using complex patterns and instead match input algorithmically, using Tcl commands.

Take the case of automating ftp. In Chapter 3 (p. 83), I showed that it was very easy to retrieve a file if the name was known in advance—either by the script or the user. If the name is not known, it is harder. For example, ftp does not support directory retrieval. This can be simulated by retrieving every file in the directory individually. (You can automate this to some degree using ftp’s built-in wildcards, but that does not handle subdirectories so it is not a complete solution and I will ignore it for now.)

Further, imagine that you want to only retrieve files created after a certain date. This requires looking at a “long” directory listing. As an example, here is a listing of the directory /published/usenix on ftp.uu.net.

ftp> cd published/usenix
250 CWD command successful.
ftp> ls -lt
200 PORT command successful.
150 Opening ASCII mode data connection for /bin/ls.
total 41
drwxrwsr-x   3 3      2  512 Sep 26 14:58 conference
drwxr-sr-x 1695 3      21 39936 Jul 31  1992 faces
lrwxrwxrwx   1 3      21   32 Jul 31  1992 bibliography ->
/archive/doc/literary/obi/USENIX
226 Transfer complete.
remote: -lt
245 bytes received in 0.065 seconds (3.7 Kbytes/s)
ftp>

It is easy to pick out the directory listing from this output. As before, you can see the protocol responses that each start with a three-digit number. These can be matched directly, but there is no way of separately matching all of the bits and pieces of information in the directory listing in a single pattern. There is just too much of it. And this is a short directory. Directories can contain arbitrarily many files.

Upon close examination, you can see that the directory lines use different formats. For example, the third file is a symbolic link and shows the link target. The second and third files show modification dates with the year while the first file shows the date with the time. And for a dash of confusion, the formatting is inconsistent—the columns do not line up in the same place from one entry to the next.

One way to deal with all of this is to match the fields in each line, one line at a time in a loop. The command to match a single line might start out like this:

expect -re "d([^ ]*) +([^ ]*) +([^ ]*) +([^ ]*) +( ...

The command is incomplete—the pattern does not even fit on the page, and it only describes directories (notice the "d" in the front). You would need similar patterns to match other file types. This complexity might suggest that this is the wrong approach.

An alternative is to use patterns only at a very superficial level. You can match the individual lines initially and then later break up the lines themselves. At this point, matching a single line should be no surprise. It is just a variation on what I have already shown in many different forms:

expect -re "([^
]*)
"

To get the individual pieces out, you can now use any of the plain old Tcl commands. You can treat $expect_out(1,string) as a simple list and index it. For example, to obtain the file name:

lindex $expect_out(1,string) 8

To obtain the month and day:

lrange $expect_out(1,string) 5 6

Using the following commands, you can get the file’s type field (the first character on each line from "ls -l“) and then process the files, directories, and symbolic links differently:

set type [string index $expect_out(1,string) 0]
switch—$type 
      "-" {
        # file
    } "d" {
        # directory
    } "l" {
        # symbolic link
    } default {
        # unknown
    }

With no flags other than "“, the patterns in the switch command are a subset of the glob patterns (everything but ^ and $). This fragment of code actually comes out of the rftp (for “recursive ftp“) script that comes with Expect as an example.

With actions, the whole command to switch on the file type is not much more complicated. There are two of them—one to “get” files and one to “put” files. Below is a procedure to put files. The procedure is called for each file in the directory listing. The first argument is the name of the file, and the second argument is the first character of the type field.

proc putentry {name type} {
    switch—$type 
    "d" {
        # directory
        if {$name=="." || $name==".."} return
        putdirectory $name
    } "-" {
        # file
        putfile $name
    } "l" {
        # symlink, could be either file or directory
        # first assume it's a directory
        if [putdirectory $name] return
        putfile $name
    } default {
        puts "can't figure out what $name is, skipping
"
    }
}

For each directory encountered, putdirectory is called, which changes directories both remotely and locally and then recursively lists the new directory, calling putentry again for each line in the list. The files "." (current directory) and ".." (parent directory) are skipped.

Regular files are transferred directly by sending a put command inside putfile. Symbolic links are trickier since they can point either to directories or plain files. There is no direct way to ask, so the script instead finds out by blindly attempting to transfer the link as a directory. Since the attempt starts by sending a "cd" command, the putdirectory procedure fails if the link is not a directory. Upon failure, the script then goes on to transfer it as a plain file. Upon success, the procedure returns.

Really Simple Patterns

Occasionally it is useful to prevent the pattern matcher from performing any special interpretation of characters. This can be done using the -ex flag, which causes exact matching. For example, the following command matches only an asterisk.

expect -ex "*"

When using -ex, patterns are always unanchored. The ^ and $ match themselves literally even if they appear as the first or last characters in a pattern. So the following command matches the sequence of characters "^“, "*“, " “, and "$“. The usual Tcl interpretations apply. Hence, the is still interpreted as a single character.

expect -ex "^*
$"         ;# matches ^ * 
 $

Consider another example:

expect -ex "\n"

Tcl interprets the \n as and the exact matching occurs with no further backslash processing. This statement matches a backslash followed by an n.

The results of exact matches are written to expect_out(buffer) and expect_out(0,string) as usual, although expect_out(0,string) is necessarily set to the original pattern.

Using -ex may seem like a way to simplify many patterns, but it is really only useful in special circumstances. Most patterns either require wildcards or anchors. And strings such as "foo" are so simple that they mean the same thing when specified via -gl in the first place. However, -ex is useful when patterns are computer-generated (or user-supplied). For example, suppose you are creating Expect patterns dynamically from a program that is printing out SQL lines such as:

select * from tbl.col where col like 'name?'

To protect this from misinterpretation by -gl or -re, you would have to analyze the string and figure out where to insert backslashes. Instead, it is much simpler to pass the whole thing as a pattern using -ex. The following fragment reads a pattern from a file or process and then waits for it from the spawned process.

set pat [gets $patfile]
expect -ex $pat

If you are hand-entering SQL commands in your Expect scripts, then you have to go a step further and protect the commands from being interpreted by Tcl. You can use braces to do this. Here is an example, combined with the -ex flag.

expect -ex {select from * tbl.col where col like 'name?'}

I show this only to discourage you again from using braces around patterns. While it works in this example, it is not necessary since you can prevent the substitutions at the time you handcode it by adding backslashes appropriately. Chances are that you will want to make variable substitutions in these or else they would have been stored in a file anyway. And if you are using more than a few patterns like these, you probably will not have them embedded in your scripts, so you do not need to worry about the Tcl substitutions in the first place.

Matching One Line And Only One Line

Matching a single line is such a common task that it is worth getting very familiar with it. The one-line script on page 131 matches a single line and this same technique will show up in many more scripts so it is worth examining closely here.

Suppose you want to search for a file in the file system with the string "frob" at the beginning of the name. There may be many files named "frob" (well, maybe not). You are just interested to know if there is at least one. The obvious tool to use is find. Unfortunately, find provides no control over the number of files it finds. You cannot tell it to quit after one. Here is an Expect script to do just that:

spawn find . -name "frob*" -print
set timeout −1
expect -re "[^
]*
"

The script starts by spawning the find command. The timeout is disabled since this could be a very long running command. The expect pattern waits for one complete line to appear, and then the script exits. This works because the range waits for any character that is not a and the * waits for any number of them—that is, any number of characters that are not ’s. The second both allows and forces a single . Finally the matches the linefeed in the carriage-return linefeed sequence. The only thing that can be matched is a single line.

Without Expect, it is possible to get find to kill itself by saving its process id in a file and then forking the kill command from an -exec clause in the find. However, doing this is fairly painful. And find is just a special case. Many other commands do not have the power of find yet share the same problem of lacking any sophisticated control. For example, grep does not have any way to execute arbitrary commands when it matches. There is no way to tell grep to print only the first match.

For this and other commands, here is an Expect script which I call firstline:

#!/usr/local/bin/expect —
eval spawn $argv
set timeout −1
expect -re "[^
]*
"

Immediately after matching the first line of output, firstline exits. Note that if the underlying process produces output quickly enough, the script may actually print several lines of output. That does not mean the pattern is matching multiple lines. It is still only matching one. However, by default Expect prints out everything it sees whether or not it matches.

In Chapter 7 (p. 171), I will describe how to change this default so that you can write scripts that only print out what they match.

Tcl’s string match Command

Having matched a single line, it is no longer possible to automatically break it up into pieces stored in the array expect_out. Tcl does, however, offer a standalone version of both the regular expression and glob pattern matchers.

Glob pattern matching is explicitly done using the string match command. The command follows the format:

string match pattern string

The string replaces the implicit reference to the input buffer in an expect command. The command returns 1 if there is a match or 0 if there is no match. For example:

if [string match "f*b*" "foobar"] {
    puts "match"
} else {
    puts "no match"
}

The switch command (demonstrated on page 131) is a little more like the expect command. It supports multiple patterns and actions, but like "string match“, switch uses an explicit string. Neither switch nor "string match" support the ^ and $ anchors.

Tcl’s regexp Command

Tcl’s regexp command matches strings using regular expressions. The regexp command has the same internal pattern matcher that expect uses but the interface is different. The expect command provides the string implicitly while regexp requires that the string be an explicit argument.

The calling syntax is as follows:

regexp pattern string var0 var1 var2 var3. . .

The first argument is a pattern. The second argument is the string from which to match. The remaining arguments are variables, set to pieces of the string that match the pattern. The variable var0 is set to the substring that was matched by the whole pattern (analogous to expect_out(0,string)). The remaining variables are set to the substrings that matched the parenthesized parts of the pattern (analogous to expect_out(1,string) through expect_out(9,string)).

expect1.1> set addr "[email protected]"

For example, the following command separates the Internet email address (above) into a user and host:

expect1.2> regexp (.*)@(.*) $addr ignore user host
1
expect1.3> set user
usenet
expect1.4> set host
uunet.uu.net

The first parenthesized pattern matches the user and is stored in the variable user. The @ matches the literal @ in the address, and the remaining parenthesized pattern matches the host. Whatever is matched by the entire pattern is stored in ignore, called this because it is not of interest here. This is analogous to the expect command where expect_out(0,string) is often ignored. The command returns 1 if the pattern matches or 0 if it does not.

The regexp command accepts the optional flag "-indices“. When used, regexp stores a list of the starting and ending character positions in each output variable rather than the strings themselves. Here is the previous command with the -indices flag:

 expect1.5> regexp -indices (.*)@(.*) $addr ignore user host
 1
 expect1.6> set user
 0 5
 expect1.7> set host
 7 18

The expect command also supports an "-indices" flag (shown in Chapter 5 (p. 123)) but there are differences between the way expect and regexp support it. The expect command writes the indices into the expect_out array alongside the strings themselves so you do not have to repeat the expect command to get both strings and indices. Also, the elements are written separately so that it is possible to extract the start or ending index without having to break them apart.

Tcl’s regsub Command

The regsub command makes substitutions in a string that matches a regular expression. For example, the following command substitutes like with love in the value of olddiet. The result in stored in the variable newdiet.

expect1.1> set olddiet "I like cheesecake!"
I like cheesecake!
expect1.2> regsub "like" $olddiet "love" newdiet
1
expect1.3> set newdiet
I love cheesecake!

If the expression does not match, no substitution is made and regsub returns 0 instead of 1. However, the string is still copied to the variable named by the last parameter.

Strings that match parenthesized expressions can be referred to inside the substituted string (the third parameter, love in this example). The string that matched the first parenthesized expression is referred to as "1“, the second as "2“, and so on up to "9“. The entire string that matched is referred to as "“.

In the following example, cheesecake matches the parenthesized expression. It is first substituted for 1 in the fourth argument, and then that string replaces "cheesecake!" in the original value of olddiet. Notice that the backslash must be preceded by a second backslash in order to avoid Tcl itself from rewriting the string.

expect1.4> set substitute "the feel of \1 in my nose."
the feel of 1 in my nose
expect1.5> regsub "(c.*e)!" $olddiet $substitute odddiet
1
expect1.6> set odddiet
I like the feel of cheesecake in my nose.

If you find this a little confusing, do not worry. You can usually accomplish the same thing as regsub with a couple of other commands. The situations in which regsub can be used do not arise that often—indeed, regsub is used only one other place in this book (page 212). However, when the need arises regsub is a real timesaver. To make it even more useful, the regsub command can be applied to every matching pattern in the string by using the -all flag.

Ignoring Case

The -nocase flag indicates that a match should occur as if any uppercase characters in the string were lowercase. The -nocase flag works for both regexp and expect. Like other expect flags, -nocase is applied separately to each pattern.

The -nocase flag can dramatically simplify patterns. Compare the following commands. All of them match the strings "hi there!“, "Hi there!“, "Hi There!“, and "HI THERE“, but the last command is the shortest and most readable.

expect "[Hh][Ii] [Tt][Hh][Ee][Rr][Ee]!"
expect -re "(hi there|Hi there|Hi There|HI THERE)!
expect -re "(hi|Hi|HI) (there|There|THERE)!"
expect -nocase "hi there!"

From the expect command, the -nocase flag can be used with both glob patterns and regular expressions. Non-alphabetic characters are not affected by the -nocase flag.

Do not use -nocase with uppercase characters in the pattern. Uppercase characters in the pattern can never match.

expect -nocase "HI THERE!"     ;# WRONG, CAN NEVER MATCH!
expect -nocase "hi there"      ;# RIGHT!

All Those Other String Functions Are Handy, Too

There are numerous other string manipulation functions that can be used when working with patterns. For example, in Chapter 3 (p. 77), I used "string trimright" to remove all the newline characters from the end of a string.

Another function that is very handy is scan. The scan command interprets strings according to a format. scan is analogous to the C language scanf function. For the most part, scan is less powerful than regexp, but occasionally the built-in capabilities of scan provide exactly the right tool. For example, a regular expression to match a C-style real number is:

([0-9]+.?[0-9]*|[0-9]*.[0-9]+)([eE][-+]?[0-9]+)?

And that is before adding the backslashes in front of "[“, "." and "+“! A much better alternative is to use Tcl’s scan command. This can match real numbers, plus you can constrain it for precision. All you have to do is feed it a string containing a number. You can have expect look for the end of the number (such as by seeing whitespace) and then call:

scan $expect_out(string,0) "%f" num

In this example, the number is stored in the variable num. The %f tells scan to extract a real number. Chapter 2 (p. 46) has more information on scan and other string manipulation commands.

Actions That Affect Control Flow

So far, all I have used in the way of expect actions are commands such as set or if/then or lists of such commands. The following expect command illustrates both of these:

expect {
    a {set foo bar}
    b {
        if {$a == 1} {set c 4}
        set b 2
    }
}

It is possible to use commands that affect control flow. For example, the following while command executes someproc again and again until the variable a has the value 2. When a equals 2, the action break is executed. This stops the while loop and control passes to the next command after the while.

while 1 {
    if {$a == 2} break
    someproc
}

You can do similar things with expect commands. The following command reads from the output of the spawned process until either a 1 or 2 is found. Upon finding a 1, someproc is executed and the loop is repeated. If 2 is found, break is executed. This stops the while loop, and control passes to the next command after the while. This is analogous to the way break behaved in the if command earlier.

while 1 {
    expect {
        "2" break
        "1"
    }
    someproc
}

Example—rogue

This previous example is a very typical Expect fragment. It does not take much more to produce a useful script. As an example, the following script provides a small assist in playing the game of rogue. rogue is an adventure game which presents you with a player that has various physical attributes, such as strength. Most of the time, the strength rating is 16, but every so often— maybe one out of 20 games— you get an unusually good strength rating of 18. A lot of rogue players know this, but no one in their right mind restarts the game 20 times to find those really good configurations—it is too much typing. The following script does it automatically:

while 1 {
    spawn rogue
    expect {
        "Str: 18" break
        "Str: 16"
    }
    send "Q"
    expect "quit?"
    send "y"
    close
    wait
}
interact

Inside a loop, rogue is started and then the strength is checked to see if it is 18 or 16. If it is 16, the dialogue is terminated. Like telnet (see Chapter 4 (p. 103)), rogue does not watch for an eof either, so a simple close is not sufficient to end the dialogue. "Q" requests that rogue quit. The game asks "are you sure" to which the script replies "y“. At this point, both the script and rogue close the connection. Then the script executes wait. As I described in Chapter 4 (p. 105), wait tells the system that it can discard the final exit status of the rogue process.

Whenrogue exits, the loop is restarted and a new game of rogue is created to test. When a strength of 18 is found, the break action is executed. This breaks control out of the while loop and control drops down to the last line of the script. The interact command passes control to the user so that they can play this particular game.

If you run this script, you will see dozens of initial configurations fly across your screen in a few seconds, finally stopping with a great game for you to play. The only way to playrogue more easily is under the debugger!

Character Graphics

The output produced by rogue in the previous section contains explicit cursor positioning character sequences. This can potentially cause the screen to be drawn in such a way that patterns fail to match the visible output. For example, imagine a score of 1000 being updated to 2000. To make the screen reflect this change, the program need only position the cursor appropriately and then overwrite the 1 with a 2. Needless to say, this will not match the string 2000 because the 2 arrived after the 000.

This particular problem does not arise in the rogue example because the screen is being drawn from scratch. This idea can be used to provide a general solution. To read the screen as if it were printed from top to bottom, force the spawned program to redraw the screen from scratch. Typically, sending a ^L suffices.

Alas, redrawing the screen does not solve other problems. For instance, there is still no way to tell where the cursor is. This may be critical if you are testing, for example, a menu-application to make sure that the cursor moves correctly from one entry to the next.

In Chapter 19 (p. 453), I will describe a way to handle this and other related problems more directly by maintaining an explicit representation of the terminal screen.

More Actions That Affect Control Flow

Just as break was used in the rogue script, so can all of the other flow-control commands be used inside of expect commands. For example, a return command inside of an expect causes the current procedure to return:

proc foo {
    expect {
        "1" return
        "2"
    }
    someproc
}

The continue command causes control to resume at the beginning of the nearest enclosing while or for loop. continue, break, and return can be mixed in intuitive ways. In the following example, the patterns 1, 2, and 3 do not mean anything in particular. They are just placeholders. The actions are what is interesting.

proc foo {
    while 1 {
        expect {
            "1" {
                return         ;# return from foo
            }
            "2" {
                break          ;# break out of while
            }
            "3" {
                if {0==[func]} {
                    exit       ;# exit program
                } else {
                    continue   ;# restart while
                }
            }
        }
        someproc
    }
    some-other-proc
}

In Chapter 3 (p. 83), I showed a script that started an anonymous ftp session and let you interact after performing the login automatically. Using some of the things you have seen since, it is possible to write a more capable version of aftp. The one below retries the connection if the remote host refuses because it is down or there are too many users. A procedure called connect is defined and called repeatedly in a loop. Anonymous ftp administrators may not appreciate this approach, but it is sometimes the only way to get through to a site that is very popular. Once connected, the script sends the binary command to disable any data conversions. As with the earlier version, this script ends by dropping into an interact.

Then you can interact as in the earlier script.

#!/usr/local/bin/expect —

proc connect {host} {
    expect "ftp>"
    send "open $host
"
    expect {
        "Name*:" {
            send "anonymous
"
            expect {
                "Password:" {
                    send "[email protected]
"
                    expect "login ok*ftp>"
                    return 0
                }
                "denied*ftp>" {
                    # too many users, probably
                    send "close
"
                    return 1
                }
                "failed*ftp>" {
                    # some other reason?
                    send "close
"
                    return 1
                }
            }
        }
        "timed out" {
            return 1
        }
    }
}

set timeout −1
spawn ftp −i
while {[connect $argv]} {}
send "binary
"
interact

Matching Multiple Times

Many tasks require an expect to be repeated some number of times. Reading files from a list is an example of this. In the example on page 133, I matched a single line with the command:

expect "([^
]*)
"

This can be wrapped in a loop to read multiple lines and break when a prompt appears:

while 1 {
    expect {
        "([^
]*)
"   process_line
        $prompt   break
    }
}

This version has additional patterns upon which to break out of the loop:

while 1 {
    expect {
        "([^
]*)
" process_line
        eof {
            handle_eof
            break
        }
        timeout {
            handle_timeout
            break
        }
        $prompt break
    }
}

Here, handle_eof and handle_timeout are imaginary procedures that perform some processing appropriate to the condition. More importantly, notice that all of the patterns but one terminate by breaking out of the loop. It is possible to simplify this by using the exp_continue command.

When executed as an expect action, the command exp_continue causes control to be continued inside the current expect command. expect continues trying to match the pattern, but from where it left off after the previous match. expect effectively repeats its search as if it had been invoked again.

Since expect does not have to be explicitly reinvoked, the while command is not necessary. The previous example can thus be rewritten as:

expect {
    "([^
]*)
" {
        process_line
        exp_continue
    }
    eof handle_eof
    timeout handle_timeout
    $prompt
}

In this example, each line is matched and then processed via process_line. expect then continues to search for new lines, processing them in turn.

Compare this version with the previous one, which was written with an explicit loop. The rewrite is a lot shorter because it does not need all the explicit break commands. There is no hard and fast rule for when to use an explicit loop instead of exp_continue, but a simple guideline is to use exp_continue when there are fewer actions that repeat the loop than those that break out of the loop. In other words, explicit loops make actions that repeat the expect shorter. exp_continue makes actions that break out of the loop shorter.

When the exp_continue action is executed, the timeout variable is reread and expect’s internal timer is reset. This is usually what is desired since it is exactly what would happen with an expect in an explicit while or for loop. For example, if timeout is set to ten seconds and input lines arrive every second, the expect command will continue to run even after ten lines have arrived. Each time exp_continue is executed, expect then waits up to ten more seconds.

To avoid resetting the timer, call exp_continue with the -continue_timer flag.

exp_continue -continue_timer

With a very small timeout, exp_continue offers a convenient way to discard additional characters that arrive soon after a match.

set timeout 1
expect -re ".+" exp_continue

In the command above, characters are ignored as long as they keep arriving within $timeout seconds of one another. When the output finally settles down, the expect command completes and control passes to the next command in the script.

Here is a variation on the same idea. The following fragment recognizes the string "ok" if it arrives in output, each character of which arrives within $timeout seconds of one another.

set buf ""
expect -re ".+" {
    append buf $expect_out(buffer)
}
if [regexp "ok" $buf] {
    . . .

In Chapter 15 (p. 339), I will show how to do the same thing but without the explicit buffering in the action.

Recognizing Prompts (Yet Again)

In Chapter 5 (p. 120), I described how to match a variety of different prompts and potentially any prompt that a user might choose. A problem I did not address is that programs can require interaction even before the first prompt. One such program is tset, which is used to set the terminal type.

tset is fairly clever, but if it cannot figure out the terminal type, it prompts the user. The tset prompt is well defined. The prompt starts with a fixed string and then has a default terminal type in parentheses, such as:

TERM = (xterm)

At this point, the user can either enter the terminal type or simply press return, in which case the type is set to the default. In most scripts, the default is fine.

The following fragment handles this interaction:

expect {
    "TERM = *) " {
        send "
"
        exp_continue
    } -re $prompt
}

Both the prompt from tset and the shell are expected. If the shell prompt shows up first, the expect is satisfied and the script continues. If the tset prompt appears, the script acknowledges it and uses exp_continue to repeat and look for the shell prompt.

The fragment does a little more work than it needs. If it finds the tset prompt once, it looks for it again even though it will not appear. To avoid this, the loop would have to be unrolled—but it would have no substantive benefit. It is easier to write and more readable as it is.

Fortunately, tset is the only interactive program that is commonly encountered while logging in. If you have need to handle anything else, it is likely unique to a user or situation. If need be, a hook can be provided for users that invoke other interactive programs while logging in.

Similarly to the way users can define their own EXPECT_PROMPT environment variable, users can also write their own Expect fragments to automate a login interaction. For example, suppose a user’s .login always prompts "read news (y|n):" upon logging in. To handle this, have the user create a file called ".login.exp“. Inside it would be just the fragment to automate their personal interaction:

expect "read news (y|n):"
send "n
"

Application scripts can then handle the interaction by detecting the presence of the file and using it just before looking for the shell prompt.

if [file readable ~/.login.exp] {
    source ~/.login.exp
}
expect -re $prompt

Speed Is On Your Side

Another use of exp_continue appears in the robohunt script that comes with Expect as an example. robohunt automates the game of hunt. Unlike the rogue script mentioned earlier, robohunt plays the whole game for you. hunt is a character graphics game that lets you navigate through a maze. You attack other players or crash through walls simply by moving into them. Certain walls cannot be broken through. If you try to do so, the game responds by ringing the bell, done by sending a ^G.

The other details of the game or script are not important except for one aspect. The script works by precalculating a number of moves and sending each batch of moves out at once. The script uses a crude heuristic for deciding which way to move, so it occasionally runs into a wall and keeps running into a wall for the rest of the batch of moves. This causes the game to send back a whole slew of ^G’s. The script handles it with the following command:

set bell "07"
expect {
    -re "^$bell+" exp_continue
    -re "again\? " {send y}
    -re ".+"
}

The first pattern checks if the output starts out with a sequence of ^G’s (here denoted by "07“). If the ^G’s are found, they are matched and effectively discarded as the action simply restarts the expect command.

If the script’s player is killed, the game stops and asks "Do you want to play again?“. It suffices to check for the final question mark and space, but this would leave the script with a fairly cryptic pattern. Adding "again" to the pattern makes it more readable with no significant impact on performance.

The third pattern checks for anything else. The only reason why anything else might appear is that the game is printing out its usual display of the screen which in turn means that it is waiting for new moves. So the expect command completes and control passes to another part of the script that computes and sends a new move.

The robohunt script may seem rather lacking in sophisticated logic and in many ways just plain stupid. It is. But it can overwhelm a human opponent by sheer speed despite constant blunders and an obvious lack of any deep understanding of its task. Nonetheless, it is not to be scoffed at. This is precisely the idea used by many computer algorithms that accomplish seemingly difficult tasks.

The robohunt script is virtually impossible for a human player to play against simply because the script is so fast. While this technique is not the usual reason Expect scripts are useful, it is certainly a technique worth remembering.

Controlling The Limits Of Pattern Matching Input

Expect is usually much faster than any human. However, certain behavior can force Expect to be slower than it could be or even worse, to fail altogether.

Some programs produce an astounding amount of output. Graphics programs are one example, but even programs that simply list files can produce a flood of output. The rate is not a problem. Expect can consume it and make way for more very quickly. But Expect has a finite amount of memory for remembering program output. By default, the limit is enough to guarantee that patterns can match up to the last 2000 bytes of output.[27]

This is just the number of characters that can fit on a 25 row 80 column screen. When a human is viewing a program producing a lot of output, everything but the last 2000 or so characters scrolls off the screen. If a decision has to be made, the human must do it based only on those last 2000 characters. Following the philosophy that Expect does what a human does, Expect effectively defaults to doing the same thing: throwing away everything but the last 2000 characters.

This may sound like a lot of information can be missed, but there are some ameliorating factors. In particular, if an interactive program produces a lot of output (more than a screenful) and wants to make sure that everything is seen, it will present the user with a prompt (e.g., "more?“). Expect can recognize this too.

The behavior of Expect to forget (i.e., throw things away) does not mean that Expect will not attempt to match output against the current patterns. Output actually arrives in small groups of characters—typically no more than 80 characters (i.e., a line) maximum. Faster programs produce these chunks faster rather than producing larger chunks. As I described in Chapter 4(p. 89), as each chunk arrives, Expect attempts to match against it with whatever is remembered from the previous output. No matter how big the chunks used, Expect attempts to match every character in the output at least once after 1999 additional characters have arrived.

Much of the time, this works quite well. However, it does not always make sense to force Expect into literally following human behavior. A human, for example, might want to see a large directory listing. Since it will immediately scroll off the screen, the choice is to pipe it through a program like more, redirect it into a file, or perhaps run the session inside of a scrollable shell provided by an emacs or xterm window. This is not necessary with Expect. It is a computer program after all, and can remember as much information as it is told.

The maximum size of matches that Expect guarantees it can make is controlled with the command match_max. As an example, the following command ensures that Expect can match program output of up to 10000 characters.

match_max 10000

The figure given to match_max is not the maximum number of characters that can match. Rather, it is a minimum of the maximum numbers of characters that can be matched. Or put another way, it is possible to match more than the current value but larger matches are not guaranteed.[28]

The limit to how high you can set match_max is governed by your particular operating system. Some systems add additional limits (such as by your system administrator or the shell’s limit command), but these are usually arbitrary and can be increased. In any implementation, you can count on being able to set the limit to a megabyte or more, so you probably do not have to worry about this limit when designing Expect algorithms.

To change the default buffer size of all future programs that will be spawned in the current script, use the -d flag. The “d” stands for “default”. This does not change the size for the currently spawned process.

match_max -d 10000

With no arguments, match_max returns the value for the currently spawned process. With a -d flag and no numeric argument, match_max returns the default value.

Setting the buffer size sufficiently large can slow down your script, but only if you let the input go unmatched. As characters arrive, the pattern matcher has to retry the patterns over successively longer and longer amounts of input. So it is a good idea to keep the buffer size no larger than you really need.

As soon as a pattern matches, the input that matches and anything before it in the buffer are removed. You can use this to speed up pattern matching. Just remove any unnecessary input by matching it. For example, imagine you want to collect the body of a mail message. Unfortunately, the mail program starts off by displaying several thousand bytes worth of headers before it gets to the body of the message. You are not interested in the headers—they only slow down the pattern matching.

Rather than just matching everything (or the prompt at the end), it is quicker to match the headers, throw them away, and then return to searching for the prompt. This could be done conveniently using exp_continue. If you need the headers, too, consider matching for them separately. While you have to write two expect commands, the result also speeds up the overall matching process. You can speed up the matching even further by matching each line individually. If the line containing the prompt arrives, you are done. If any other line arrives, append the line to a buffer and repeat the expect command as before. In this way, the pattern matcher never has to rematch more than a line’s worth of data. This technique can produce a significantly faster response if you are waiting for the prompt at the end of a 100Kb mail message!

For most tasks, the speed of pattern matching is not a concern. It usually happens so quickly that you never notice a delay. But enormous amounts of unmatched input combined with sufficiently complex patterns can take several seconds or more, causing noticeable delays in processing. In such cases, if you cannot simplify your patterns, it may pay to change your strategy from trying to match a large amount of data with a single pattern to iteratively matching characters or lines or whatever chunks are convenient as they arrive and storing them for later processing.

The full_buffer Keyword

On page 146, I described how expect discards input when its internal buffer is exceeded. The special pattern full_buffer matches when no other patterns match and expect would otherwise throw away part of the input to make room for more.[29] When full_buffer matches, all of the unmatched input is moved to expect_out(buffer).

As with other special patterns, such as eof and timeout, full_buffer is only recognized when none of the -gl, -re, or -ex flags has been used.

The following fragment was written for someone who needed a program that would “spool up” a relatively slow stream from the standard input and send it to a telnet process every 3 seconds. They wanted to feed telnet with a few big chunks of data rather than lots of tiny ones because they were running on a slow network that could not afford the overhead.

set timeout 3
while 1 {
      expect_user {
        eof exit
        timeout {
            expect_user "*"
            send $expect_out(buffer)
        }
        full_buffer {send $expect_out(buffer)}

}

The program works by sitting in a loop which waits for three seconds or a full buffer, whichever comes first. If the buffer fills, it is sent out immediately. If three seconds pass, another expect command is executed to retrieve whatever data has arrived, and that data is sent to the remote side.

The expect_user command is a special version of the expect command that reads from the standard input. I will describe this command in detail in Chapter 8 (p. 188).

Double Buffering

When a spawned process produces a line of output, it does not immediately go into Expect’s buffer. In Chapter 4 (p. 89), I described how the UNIX kernel processes characters in chunks. The kernel, in a sense, contains its own buffer from which it doles out these chunks when expect asks for more.

This kernel buffer is separate from expect’s buffer. The kernel’s buffer is only checked when expect cannot find a match using the data already in its own buffer. This double buffering rarely has any impact on the way scripts behave. However, there are some cases in which the buffering does make a difference. For instance, imagine that you have a shell script named greet that prints hello, sleeps five seconds, and then prints goodbye.

echo hello
exec sleep 5
echo goodbye

Now, consider the following Expect script:

spawn /bin/sh greet
expect "h"
exec sleep 10
expect -re ".*o"

This script finds the h from hello and then sleeps for 10 seconds. During that time, the shell script prints goodbye. This string is handed to the kernel which buffers it until Expect asks for it.

When Expect awakens, expect searches its input buffer for anything with an o at the end. This is satisfied by ello and expect returns. The string goodbye is not tested because it is never read by expect.

A more realistic situation arises when using the simple "*" pattern. This always matches everything in expect’s internal buffer and returns immediately. It never causes expect to ask the kernel for more input, even if there is no data waiting.

So "expect "*"" clears expect’s internal buffer but not the kernel’s buffer. How can the kernel’s buffer be cleared? Intuitively, you need to read everything that is waiting. But how do you know what “everything” is? expect can only ask for the amount of data described by the match_max command. If you can guarantee how much the spawned process has written, youcan do this:

expect "*"       ;# clear Expect's internal buffer
match_max $big     ;# get ready for everything waiting
expect -re ".+"    ;# read it, match it, discard it

If you are not prepared to declare how much could have been written, you cannot have expect read in a loop. The spawned process may be writing at the same time that you are reading in which case you can start throwing away more than what was “old”.

Realistically, system indigestion can throw off any timing that you are hoping to rely on to decide when it is time to flush buffers. The best solution is still to explicitly provide patterns to match old output and then have the script throw the buffer away.

In general, asking for expect to flush kernel buffers usually indicates that something is poorly designed—either the Expect script or the application. In Chapter 8 (p. 188), I will describe an application where these kinds of problems have to be dealt with.

Perpetual Buffering

The -notransfer flag prevents expect from removing matching characters from the internal buffer. The characters can be matched repeatedly as long as the -notransfer flag is associated with the pattern.

expect -notransfer pat

The -notransfer flag is particularly useful for experimenting with patterns. You can drive a program up to a point where it loads up the internal buffer and then try various patterns against it again and again. For convenience, the -notransfer flag can be abbreviated "-n" when Expect is running interactively.

In the next chapter, I will show some additional debugging aids that can be usefully combined with the -notransfer flag.

The Politics Of Patterns

Creating a pattern that matches a string is not always an easy task. A common dilemma is whether to use a very conservative pattern or a more liberal pattern.

Conservative patterns typically have few or no wildcards and only match a limited number of strings. While easy to read, they carry the potential risk of not being able to match a string that deviates from the expected.

Liberal patterns are more forgiving with the ability to match any string that could conceivably appear. These patterns underspecify the requirements of a string and therefore risk of being able to match strings that were not intended to be matched.

For example, automating a login requires that the initial prompt be accepted. There is surprising nonconformity even at this level. For instance, UNIX systems commonly prompt with "login:" while VMS systems prompt with "Username:“. One way to automate this might be:

expect -re "(login|Username): "

But if you run into a system someday that just prompts "User“, it will not be accepted. This string and others can be added to the command, but eventually you may end up just accepting anything that ends with a colon and a space:

expect -re ".*: $"

The $ lowers the risk of the string appearing in the middle of some other output.

Incidentally, handling VMS and UNIX systems in a single script may seem hard to believe. However, the passmass script that comes with Expect as an example does just this. passmass sets your password on any number of hosts. The idea is that you want to keep your password the same on all the computers that you use, but when it comes time to change it, you only want to do it once. passmass does this—it logs into each machine and changes your password for you.

The actual password-changing dialogue is fairly similar from one operating system to another. Of course, the prompts are wildly different. So are the diagnostics reporting, for example, that your new password is not acceptable.

Here is an excerpt from passmass, where it sends the new password and then resends it as a verification. The badhost function records the hosts that fail so that it is easy to see afterwards which ones require manual assistance.

send "$newpassword
"
expect -re "not changed|unchanged" {
    badhost $host "new password is bad?"
    continue
} -re "(password|Verification|Verify):.*"
send "$newpassword
"
expect -re "(not changed|incorrect|choose new).*" {
    badhost $host "password is bad?"
    continue
} $prompt

Expecting A Null Character

The null character is another name for a zero-valued byte.[30] Tcl provides no way to represent nulls in strings. Indeed, internally Tcl reserves null to delimit strings—so even if you could get a null in a string, you cannot do anything useful with the result. Fortunately, this is not a problem.

Nulls are almost never generated by interactive processes. Since they have no printing representation, users cannot see them and so there is little point in sending nulls to users. Nonetheless, nulls have valid uses. The most common use for nulls is to control screen graphics. The nulls are used either to delay character arrival on slow screens or as parameters to screen formatting operations. Both of these operations work correctly in Expect. Expect passes nulls to the standard output just like any other character.

By default, Expect removes any nulls before doing pattern matching. This is done for efficiency—it allows the pattern matcher to use the same internal representation of strings that Tcl uses.

Removal of nulls can be disabled with the remove_nulls command. The nulls can then be matched explicitly using the null keyword. To prevent nulls being removed from the output of the currently spawned process, use the command remove_nulls with an argument of 0. The following fragment calls remove_nulls and then looks for a null in the output.

remove_nulls 0
expect null

An argument of 1 causes nulls to be removed. The remove_nulls command handles its arguments similarly to the match_max command. With no arguments, the value for the currently spawned process is returned. With a -d, the default value is returned. A new default is set by using -d followed by 0 or 1.

You cannot directly embed the null keyword inside of another pattern. Nulls can only be matched by themselves. Null matching is unanchored. Hence, when expect looks for a null it skips over any other characters to find it. Any characters that are skipped can be found, as usual, in expect_out(buffer). Since nulls are internally used to terminate strings, unanchored patterns cannot be matched into the buffer past a null. Fortunately, this is not a problem since the null pattern can always be listed (and searched for) last. I will show an example of this shortly.

If nulls are being used to pad data, it is just a matter of waiting for the correct number of nulls. For example, to wait for two nulls:

expect null
expect null

The more typical use is when receiving binary data. For example, suppose you expect an equal sign followed by an integer represented as four bytes, most significant byte first. This task is best separated into two parts, illustrated by the two commands in the following fragment:

expect "="
set result [expect_four_byte_int]

The first command looks for the equals sign. The second is a procedure to collect a four byte integer that may contain binary zeros. This procedure is not predefined by Expect but here is an example of how you might write it:

proc expect_four_byte_int {} {
    set x 0
    for {set i 0} {$i<4} {incr i} {
        set x [expr $x*256]
        expect "?" {
            scan $expect_out(0,string) %c d
            incr x $d
        } null
    }
    return $x
}

The procedure works by expecting a single byte at a time. Null bytes are matched with the null keyword. Non-null bytes are matched with the "?“. Each time through the loop, the previous subtotal is shifted up to make room for the new byte. The new byte value is added to the current subtotal. Since a null has a 0 byte value, no addition and hence no action is even necessary in that case. It just has to be matched.

This approach to handling null bytes may seem slow and awkward (and it is), but the reality is that Tcl is optimized as a user interface, and handling binary data in a user interface almost never happens. The tradeoff of allowing null to be handled differently is that it allows the rest of Tcl to be much simpler than it otherwise would be.

Parity

Parity refers to the process of error detection by modification and inspection of a single bit in each byte. There are two basic types of parity. Odd parity means that the number of 1 bits in the byte is odd. If a letter is not naturally represented by an odd number of 1 bits, the high-order bit is forced to be 1. Even parity is just the opposite.

Parity was never much good, being very susceptible to transmission noise. In this day and age, parity is totally useless. Nonetheless, some old computer peripherals generate it anyway. And worse, they provide no way of disabling it. Locally spawned processes do not add parity in the first place. You only have to worry about parity when communicating with other peripherals, such as modems or telecommunication switches.

By default, expect respects parity. expect passes characters with their parity on to the standard output (or log file) and also does pattern matching with the original parity. The reason this is useful, of course, is that many programs use all the bits in a byte to represent data. Eight-bit character sets (prevalent in Europe) do not work if one of the bits is used for parity.

Usually, parity is not a consideration. Indeed, if your Expect dialogues are working just fine, then you can skip this section. However, you may occasionally find that some of your characters are unreadable or just plain wrong. For example, suppose that you use tip to dial up another computer and the following gibberish appears instead of a prompt to login:

lo¿i¿:¿

In this case, the ¿ represents a character that would have had an even number of bits but was modified to force odd parity. You may not see this particular symbol but similar garbage will definitely clue you in that there is a problem.

In many cases, you can just tell the remote side not to generate parity. If the equipment does not support any way of changing parity, you can use the parity command.

The parity command handles its arguments similarly to the match_max and remove_nulls commands. When called with an argument of 0, parity is stripped from the current process. If called with a nonzero argument, parity is not stripped. With no argument, the current value is returned. With the -d flag, the parity is set or examined for future processes.

parity 0   ;# strip parity

The parity command only affects how Expect treats parity. Your terminal parameters can affect it as well. For example, if your terminal is set to strip parity on input, any eight-bit characters you enter, arrive without the high-order bits. Output from spawned processes can also be affected because they have their own terminal settings. If your system does not have a “sane” idea of initial terminal parameters, you will have to correct or override it. I will describe how to do this in Chapter 13 (p. 296).

Length Limits

I have already mentioned that the number of parenthesized expressions in regular expressions is limited to 9. There are two other limits worth mentioning. While it is highly unlikely that you will run into them, describing them may help your peace of mind.

There is a limit on the length of regular expressions. The precise figure depends on the details of a particular regular expression, but 30,000 characters is a safe bet. The length of glob patterns and the strings against which either glob patterns or regular expressions match are limited only to the amount of available memory.

Comments In expect Commands

It is tempting to add comments after patterns and actions. However, it cannot be done arbitrarily. For example, the following example does not do what was intended.

expect {
    "orange" squeeze_proc   ;# happens every morning
    "banana" peel_proc
}

The problem in this code fragment is that the comment occurs in a place where the expect command is expecting patterns and actions. The expect command does not have any special way of handling comments. In this example, the expect command simply assumes that ";#" is a pattern and happens is the associated action. every and morning are also interpreted as a pattern and action.

This particular comment is rather lucky. The pattern banana is still used as a pattern. However, if the comment had one more word in it, banana would be used as an action and peel_proc as a pattern!

Remember that comments behave a lot like commands.[31] They can only be used where commands can be used. If you want to associate a comment with an action, then use braces to create a list of commands and embed both the comment and the action within the list. In the following fragment, all of the comments are safe.

expect {
    "orange" {
        # comments can appear here safely, too
        squeeze_proc   ;# happens every morning
        # comments can appear here safely, too
    }

    "banana" peel_proc
}

Restrictions On expect Arguments

The expect command allows its arguments to be surrounded by a pair of braces. This behavior was described in Chapter 3 (p. 76) and is used heavily throughout the book. Bracing the argument list is a convenient feature. Without it, you would have to put backslashes at the ends of many lines to keep long expect commands together.

Consider these two forms:

expect 
    pat1 act1 
    pat2 act2 
    pat3 act3

expect {
    pat1 act1
    pat2 act2
    pat3 act3
}

Unfortunately, there is one pitfall with the second form. When the second form is used, there is a question whether the list is a list of patterns or just a single pattern. Although unlikely, it is possible that a pattern could really be " pat1 act1 pat2 act2 pat3 act3 “. And while this pattern does not visually look like the multiline expect command above, internally the same representation is used for both.

The expect command uses a heuristic to decide whether the argument is a pattern or a list of patterns.[32]The heuristic is almost always correct, but can be fooled by very unusual patterns or indentation. For instance, the pattern in the previous paragraph is misinterpreted as a list of patterns. It simply looks too much like a list of patterns. Fortunately, situations like this almost never arise. Nonetheless, you may need to worry about it, particularly if you are machine-generating your Expect scripts.

In order to force a single argument to be treated as a pattern, use the -gl flag. (The pattern must be a glob pattern or else it would already have a -re or -ex flag which necessarily would mean there must already be two arguments.) For example:

expect -gl $pattern

The opposite problem can also occur. This is, Expect may mistake a list of patterns for a single pattern. The most likely reason for this to happen is if you provide the list all on the same line. Consider the following command:

expect {pat1 act1 pat2 act2}

The expect command will assume you are looking for the pattern "pat1 act1 pat2 act2“. Here, expect is thrown off by the lack of newlines. After all, there is no point in using braces if you are just going to put all the patterns on the same physical line as the expect command. Leaving them off would be simpler (and take less space).

A newline after the opening brace is sufficient to clue expect in to the fact that the argument is intended as a list of patterns and actions. Alternatively, you can force a single argument to be treated as a list of patterns by using the -brace flag before the list. This allows you to have an expect command with multiple patterns—all of which fits on a single line.

expect -brace $arglist

In the next section, I will demonstrate another use for the -brace flag.

eval—Good, Bad, And Ugly

It is occasionally useful to dynamically generate expect commands. By that I mean that the commands themselves are not prewritten in a script but rather are generated while the script is running.

As an example, suppose you want to wait for a particular pattern (”pat1“) and optionally look for another pattern (”pat2“) depending on whether a variable (”v2“) is 1 or 0. An obvious rendering of this logic is as follows:

if {$v2} {
    expect pat1 act1 pat2 act2
} else {
    expect pat1 act1
}

This works. However, lengthy lists of patterns and actions can make this code difficult to maintain. If you want to make a change to an expect command, you will have to make it twice. The odds of a programming error are going to go up.

Even worse, this solution does not extend well if you add another pattern that is dependent on another variable. You will need to have four expect commands to cover all the possibilities. Additional variables quickly cause this technique to become totally unmanageable.

It is tempting to store the patterns and actions into variables which are later appended to the remaining patterns; however, this must be done with care. Examine this incorrect attempt:

if {$v2} {
    set v2pats "pat2 act2"
} else {
    set v2pats ""
}
if {$v3} {
    set v3pats "pat3 act3"
} else {
    set v3pats ""
}

expect pat1 act1 $v2pats $v3pats  ;# WRONG

In the expect command, the patterns in v2pats and v3pats are listed as separate arguments. expect interprets v2pats as a pattern and $v3pats as the associated action. Obviously this is not what is intended.

A better and very efficient solution is to iteratively build up a list of the patterns and actions, adding to it as appropriate. When ready, the list is passed to expect.

set patlist ""
if {$v2} {lappend patlist "pat2" act2}
if {$v3} {lappend patlist "pat3" act3}
expect -brace "pat1 act1 $patlist"

Each additional variable and pattern adds only one line. At the end is a single expect command. Notice the -brace argument which forces expect to interpret the argument as a list instead of a single pattern. This is one of the few situations where it is absolutely necessary to give expect a hint about its argument actually being a set of patterns.

One remaining drawback is that patterns and actions on the last line cannot use double quotes in the usual way (to surround the patterns) since the entire list is already double quoted. If pat1 is a variable reference that expands to a string with embedded whitespace, expect sees this as two separate arguments. Using braces instead of either set of quotes (inner or outer) does not help because then the variable substitutions cannot occur.

The only way out of this dilemma is to use eval. The eval command appends all of its arguments together and then evaluates the resulting string as a new command. Here is the idea:

eval expect "$pat1" act1 $patlist   ;# almost right!

The eval command dynamically generates a new expect command with the remaining arguments. The -brace flag is no longer necessary since the arguments are now passed separately.

The eval command breaks apart any arguments that are also lists. This is just what you need to handle patlist, but it is not right for $pat1 and act1. They must be protected if they include whitespace. The most general solution is to put $pat1 and act1 inside of a list command. This also protects patterns with special symbols like braces. Consider either of these:

eval expect [list "$pat1"] [list act1] $patlist
eval expect [list "$pat1" act1] $patlist

If act1 is just a list of commands already in braces, a second set of braces suffices.

eval expect [list "$pat1"] {{
    cmd1
    cmd2
}} $patlist

This may look peculiar, but then most eval commands do. Fortunately, this kind of situation rarely arises, but if it does you have a general solution to solving it.

Exercises

  1. Experiment with telnet by writing a script to try all of the different port numbers. Record what comes back.

  2. In the aftp script on page 141, I hardcoded my own name and address. Modify the script so that it uses the name and address of whomever runs it.

  3. Write the putfile and putdirectory procedures that are used by the excerpt from rftp on page 131.

  4. Write a procedure to count the number of lines in a string. Do it without looping. Modify the procedure so that it counts the number of digits in a string. Where might this be useful?

  5. Enhance the maxtime script from Chapter 4 (p. 98) so that it can exit after there is no output for the given amount of time. Provide a command-line option to select this behavior.



[26] To match any character but a "^“,use the pattern "[^^]“. To match a "^“outside a range,quote it with a backslash. To match a "^“inside a range, put itin any position of the range but the first. To match any character but a "]“, use the pattern "[^]]“.

[27] This is not exactly the same thing as simply saying “the limit is 2000 bytes” for reasons I will get to shortly.

[28] The peculiar definition of match_maxis a concession to performance. In order to efficiently carry out the process of reading and matching new characters along with old ones, during the matching process Expect uses up to double the space declared by match_max.Thus, it is possible to match up to twice as much as match_maxguarantees.

[29] There is a wayto obtain the discarded input without explicitly matching full_buffer or any other action. However, I will not introduce the tools to accomplish this until Chapter 18(p. 403).

[30] Pedants insist that the correct term is NUL and that “null character” is meaningless. However, both Standard C and POSIX define “null character” so I believe this term to be accepted and understood by most people.

[31] The only significant difference between comments and commands is that arguments of a comment are not evaluated.

[32] Both theexpectandinteractcommands use this same heuristic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.130.24