Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Strings, Strings, Everywhere Strings!

Now that you know how to do basic math using Tcl, you’re ready to learn how to perform a wide variety of string operations. Tcl has a rich set of commands and functionality for manipulating strings, an unsurprising fact when you consider that Tcl is a string-based programming language. Everything in Tcl is a string, even numbers. This characteristic of the language sometimes takes beginners by surprise because certain operators behave differently, depending on the context in which they are used, which can lead to unexpected results. If I’ve done my job properly, though, you’ll be able to recognize and avoid these gotchas. In this chapter, you will spend some quality time with the string command, which is the primary Tcl command for working with strings. The final section continues the discussion of Tcl control structures I started in the previous chapter by introducing two looping commands, while and for.

Mad Libs

To play this chapter’s game, you provide a word that meets specific criteria, such as an adjective, a verb ending in -ing, or a noun, to create what we called Mad Libs when I was growing up. The script takes the words and parts of speech that you provide and plugs them into a story. The result is a silly or nonsense story that is also (hopefully) amusing or at least mildly entertaining. To start the game, execute the script mad_lib.tcl in this chapter’s code directory. Here are the results of one execution:

$ ./mad_lib.tcl
Enter a verb ending in -ing: swimming
Enter a adjective: enormous
Enter a mythical creature: unicorn
Enter a piece of furniture: coffee table
Enter a noun: sink
Enter a past tense verb: yanked
Enter a noun: shovel
Enter a number: 10
One day while I was swimming in my living room, a enormous unicorn fell through the
roof. It jumped on the coffee table and knocked over the sink. Then it ran into the
dining room and yanked a shovel. After 10 minutes of chasing it through the house I
finally caught it and put it outside. It quickly flew away.

Okay, nothing is blowing up, and you’re probably not rolling on the floor laughing. Nonetheless, mad_lib.tcl shows you how to do the following programming tasks:

Repeat a body of Tcl code multiple times.
Find characters in strings.
Find substrings in strings.
Replace one substring with another.
Incorporate user input into your script.

A significant portion of Tcl programming, indeed, of almost any programming, is reading, writing, and manipulating string-based data. This chapter introduces you to a substantial portion of Tcl’s string-handling capabilities. There is a lot to cover in this respect, though too much to stuff into one chapter, so I’ve saved more advanced string-handling functionality for later chapters.

The `string` Command

The command you will use most often to work with strings is the aptly named string command. As of Tcl 8.4, the string command has 21 options that define all of the operations you can perform with it. The general form of the string command is:

string option arg ?arg? ...

Each option accepts at least one argument, arg, but most take more. For convenience and completeness, Table 4.1 lists each of string’s options and gives a short description of the option’s purpose.

Table 4.1. string Options

Option	Description
`bytelength`	Returns the number of bytes required to store a string in memory.
`compare`	Tests two strings for lexicographic equality.
`equal`	Tests two strings for lexicographic equality, returning 1 if the strings are identical, 0 if they are not
`first`	Returns the index of the first occurrence of a substring.
`index`	Returns the character that appears at a specified location in a string.
`is`	Tests whether a string is a member of a given character class.
`last`	Returns the index of the last occurrence of a substring.
`length`	Returns the length of a string.
`map`	Replaces substrings with new values based on key-value pairs.
`match`	Tests a string for matches against a pattern using shell-style globbing.
`range`	Returns a substring specified by start and end values.
`repeat`	Returns a string repeated a specified number of times.
`replace`	Removes a specified substring or replaces a specified substring with another.
`tolower`	Converts a string to all lowercase characters.
`toupper`	Converts a string to all uppercase characters.
`totitle`	Converts the first character of a string to uppercase.
`trim`	Removes leading and trailing characters that match a specified pattern.
`trimleft`	Removes leading characters that match a specified pattern.
`trimright`	Removes trailing characters that match a specified pattern.
`wordend`	Returns the index of the end of the word containing a specified character.
`wordstart`	Returns the index of the beginning of the word containing a specified character.

Table 4.1 should give you a good sense of the breadth of Tcl’s string-handling capabilities. I’ll show each option’s syntax diagram and describe each of the options in the following sections. To structure the discussion, I’ve arranged the options into three broad groups based on their function: options for comparing strings, options for getting information about strings, and options for modifying strings.

Comparing Strings

Comparing one string to another is a common programming task. Typically, you want to see if one string is the same as another (or not), such as validating a user name or password. Another frequent need is testing a string to see if it contains a given character or sequence of characters. For example, you might want to make sure that user input, say, the number of players in a game, contains only numbers and no letters. The string command has three options for comparing strings: compare, equal, and match. In addition, you can use the operators eq, ne, ==, !=, <, <=, >, and >=.

Kurt’s First Rule for Comparing Strings: Use compare, equal, eq, and ne to compare strings. String comparisons almost always occur in an if, while, or expr command. However, using the logical operators (==, !=, <, and >) is inefficient because of the way that Tcl parses expressions. As you learned in Chapter 3, the expr command has its own expression evaluator that performs substitutions before the main interpreter performs its substitutions. Recall also that the if command (and the while command that you’ll see at the end of this chapter) use the same engine as expr. When the expression parser encounters one of the logical operators, it converts the operands to numeric values and then converts them back to strings when it detects that a string comparison is being performed. The compare and equal options (and the eq and ne operators) do not perform these internal conversions because they are designed for use with strings.

The following example, rule.tcl in this chapter’s code directory, illustrates the point:

set hexVal "0xF"
set intVal "15"

# Use compare, equal, eq, and ne to compare strings
if {$hexVal == $intVal} {
    puts "$hexVal equals $intVal"
} else {
    puts "$hexVal does not equal $intVal"
}
if {$hexVal eq $intVal} {
    puts "$hexVal equals $intVal"
} else {
    puts "$hexVal does not equal $intVal"
}

If you execute this program, you’ll see this odd result:

$ ./rule.tcl
0xF equals 15
0xF does not equal 15

Since when is “0xF” the same as “15”? The first if statement compares the variables hexVal and intVal using the logical operator ==. Their values are converted to decimal (integer) numbers, 15 in both cases, and found to be equal. If you intended to compare two strings (by declaring the variables using “” around their values), you would expect this comparison to evaluate to false. The second if command uses eq, a synonym for the equal operator you’ll see in the next section, which prevents the expression evaluator from performing the numeric conversion and, even in the absence of quotes in the if command, compares the two variables’ values as strings.

The `compare` Option

The compare option tests two strings for lexicographic equality, where “equality” means the two strings are the same on a character-by-character basis. Its syntax is:

string compare ?-nocase? ?-length N? string1 string2

string1 and string2 are the strings to compare. By default, the comparison is case-sensitive, so if you want a case-insensitive comparison, specify the -nocase option. To limit the comparison to the first N characters, where N is an integer, specify -length N. compare works the same way as C’s strcmp() and strncmp() functions, so it returns -1 if string1 is lexicographically less than string2, 1 if string1 is lexicographically greater than string2, and 0 if the two strings are equal. The following script (compare.tcl in this chapter’s code directory) illustrates how compare works:

puts -nonewline "Enter player name: "
flush stdout
gets stdin playerName

# Test for strict equality (case-sensitive)
if {![string compare $playerName "Bubba"]} {
    puts ""$playerName" is in use."
    puts -nonewline "Please select another name: "
    flush stdout
    gets stdin playerName
}
puts ""$playerName" successfully registered."

Notice in the last line how I use “” to cause the name entered to appear in quotes in the output. It’s a little ugly to write and to look at, but that’s how you have to do it. Executing the script, you might see the following results:

$ ./compare.tcl
Enter player name: Bubba
"Bubba" is in use.
Please select another name: Kurt
"Kurt" successfully registered.
$ ./compare.tcl
Enter player name: BUBBA
"BUBBA" successfully registered.

Entering the name BUBBA foils the point of the code, which is to make sure that the player name Bubba doesn’t get used twice in the same game. This is when the -nocase argument comes in handy, because it disables case-sensitivity when comparing two strings (see compare_nocase.tcl in this chapter’s code directory):

puts -nonewline "Play again (Y/N): "
flush stdout
gets stdin choice

# Case-insensitive comparison
if {![string compare -nocase $choice "y"]} {
    puts "Excellent! Starting next level."
} else {
    puts "Quitters never win. Exiting."
}

compare_nocase.tcl’s output should resemble the following:

$ ./compare_nocase.tcl
Play again (Y/N): y
Excellent! Starting next level.
$ ./compare_nocase.tcl
Play again (Y/N): Y
Excellent! Starting next level.

This script shows how you can make a script slightly more tolerant of sloppy typing using string compare’s -nocase argument. Whether the user types “y” or “Y,” the game will continue (or it will insult the user if “n” or “N” is entered). Modifying comparte.tcl to ignore case is left as an exercise for the reader.

The -length N argument enables you to limit the comparison to the first N characters of the strings being compared. If N is negative, the -length argument will be ignored, although I have a hard time imagining a situation in which N would be negative, except when it is passed a variable whose range might include a negative value.

The `equal` Option

The equal option is almost identical to the compare option (the syntax is identical). The difference between the two is that equal compares strings for strict equality, returning 1 (true) if the strings are exactly identical or 0 (false) if the strings are not identical. compare, you will recall, evaluates whether two strings are lexicographically less than, equal to, or greater than one another. The following example, equal.tcl in this chapter’s code directory, rewrites compare.tcl to use equal:

puts -nonewline "Enter player name: "
flush stdout
gets stdin playerName

# Test for strict equality (case-sensitive)
if {[string equal $playerName "Bubba"]} {
    puts ""$playerName" is in use."
    puts -nonewline "Please select another name: "
    flush stdout    gets stdin playerName
}
puts ""$playerName" successfully registered."

Like I said, compare and equal have the same syntax; the only difference is the nature of the comparison. As a result, you will most often use the equal option because it is rare that you need to determine if one string is less than or greater than another.

The eq operator is a synonym for string equal and exists to make tests for string easier to read and write and to make such statements look more like other logical operations. For example, string equal requires the awkward looking expressions in the previous examples, such as string equal $playername "Bubba". The eq operator lets you write the more natural expression $playername eq "Bubba". Thus, equal.tcl becomes eq.tcl:

puts -nonewline "Enter player name: "
flush stdout
gets stdin playerName

# Test for strict equality (case-sensitive)
if {$playerName eq "Bubba"} {
    puts ""$playerName" is in use."
    puts -nonewline "Please select another name: "
    flush stdout    gets stdin playerName
}
puts ""$playerName" successfully registered."

Using eq instead of string equal makes the if command much easier to scan and understand, in my opinion. Notice that brackets weren’t necessary in this case; I wanted the variable $playerName to be substituted so the comparison would work. In fact, grouping the conditional expression in the if command would result in a syntax error because the interpreter would treat the literal string Bubba as a command.

The `match` Option

The match option compares a string to a pattern and returns 1 if the string matches the pattern and 0 otherwise. The complete syntax is:

string match ?-nocase? pattern string

Where equal tests for simple equivalence between two strings, match introduces the ability to test for equivalence between pattern and string. As usual, string can be either a literal string or a string variable. Likewise, pattern can be a literal string or a variable. The difference is that pattern can contain the wildcard characters * and ?. * represents a sequence of zero or more characters and ? represents any one character. The UNIX geeks among you will recognize pattern as a glob.

Consider the pattern alpha*, which is the literal string alpha followed by any sequence of zero or more characters. The following list shows a few matching and nonmatching strings:

alphabet—matches
Alphanumeric—doesn’t match (uppercase A)
alpha male—matches
alpha—matches (* matches a sequence zero or more characters)
alpaca—doesn’t match
lambda nalpha—doesn’t match (* matches at the end of the string)

Similarly, given the pattern ga?e, the strings game, gate, and gale match the pattern while the strings gayle, glare, and regale do not. match.tcl demonstrates matches using * and ?.

In addition to * and ?, you can specify a pattern that consists of a set of characters using the form [chars], where chars is a list of characters. chars can be specified using the format x-y to indicate a range of consecutive Unicode characters. For example, to see if a one-character string variable input is an uppercase character, one (inefficient) way to write the test is:

if {[string match {[A-Z]} $input]} {
    # do something
} else {
    # do something else
}

Notice that the expression [A-Z] is enclosed in braces. If you don’t use the braces, the interpreter will attempt to execute a command named A-Z and substitute the results into the string match expression. You probably don’t have a command named A-Z (Tcl certainly doesn’t). The braces prevent this substitution.

Tip: Matching the `match` Characters

If you need to match one of the wildcard characters or the right or left bracket, escape it with a (thus, *, ?, [, ]).

Pattern matching using string match is useful when you need to compare a string to a value that can vary in a regular or systematic way. For example, if you store player scores in files named name.scr, where name is each player’s name, you could use the expression string match "*.scr" $filerName. Another way to use string match is to test whether or not a given string contains characters that might be forbidden. For example, to make sure that player names do not contain uppercase letters, you might write the following bit of code (see no_caps.tcl in this chapter’s code directory):

if {[string match {*[A-Z]*} $playerName]} {
        puts "Your player name cannot contain uppercase letters"
}

The pattern *[A-Z]* matches zero or more characters followed by any single uppercase character followed by zero or more characters. This pattern will match any string that contains a capital letter, regardless of where in the string it occurs.

string’s match option gives you a powerful and easy-to-use tool to identify matches that aren’t exact. As you gain experience with Tcl, the situations in which pattern matching is an appropriate solution will be clear.

Inspecting Strings

Although comparing strings to one another is a useful thing to be able to do, it is also one of the least interesting things to do. The string options you learn in this chapter let you find out more about a string, such as how long it is, what character is present at a particular location in the string, what is the first or last character in the string, and what kind of characters the strings contain.

The `length` and `bytelength` Options

string bytelength string
string length string

The bytelength option returns the length of string in bytes, whereas the length option returns the length of the string in characters. A string’s bytelength might not be the same as the number of characters because, as you might remember, Tcl uses Unicode, which can take up to three bytes to represent a character. In this book and in most of your work with Tcl, you will almost always want to use string length string, because the situations in which you need to know a string’s length in actual bytes are uncommon. For completeness’ sake, however, length.tcl shows the use of both:

set phrase "®"
puts "Length in bytes of phrase: [string bytelength $phrase]"
puts "Length in characters of phrase: [string length $phrase]"

The output shows you the difference between the length and bytelength options:

$ ./length.tcl
Length in bytes of phrase: 2
Length in characters of phrase: 1

As you can see, the phrase, which translates to fine quality, is only 1 character long (count ’em yourself if you wish), but it requires 2 bytes to store (two bytes per character).

The `index` Option

If you want to find out what character is at a given position in a string, use the string index command. Its complete syntax is:

string index string n

This command returns the character located at position, or index, n of string. Index values are 0-based (counted from 0). For example, given the string “dice,” the command string index "dice" 0 returns d and string index "dice" 3 returns e (see index.tcl):

set str "dice"
puts "The character at index 0 of dice is '[string index $str 0]'"
puts "The character at index 1 of dice is '[string index $str 1]'"
puts "The character at index 2 of dice is '[string index $str 2]'"
puts "The character at index 3 of dice is '[string index $str 3]'"

The output of this script should look just like the following:

$ ./index.tcl
The character at index 0 of dice is 'd'
The character at index 1 of dice is 'i'
The character at index 2 of dice is 'c'
The character at index 3 of dice is 'e'

You can specify the index value n using an integer, the word end, or the expression end- int, where int is an integer. If n is less than 0 or greater than the length of the string, string index returns the empty string. That’s right. Unlike many programming languages, referring to an invalid string index in Tcl does not generate an error. The end- int syntax for specifying an index makes it trivial to iterate over a string in reverse (that is, to perform an operation on a string starting from its last character and ending at its first). You don’t know how to loop over a string in this way (yet!—see “Iterative Loops: The for Command” later in this chapter), but trust me, it’s a common operation, so you’ll appreciate having a brain-dead easy syntax for doing it.

The `first` and `last` Options

The first and last options make it possible to find the index value of the first and last occurrences, respectively, of a substring in a string. Their complete syntax is:

string first substr str ?start?
string last substr str ?end?

string first searches for the first occurrence of the substring substr in the string str and returns the index of the first letter of substr. string last, similarly, returns the index of the first letter of the last occurrence of substr in str. If the specified substr is not found, both options return -1. By default, string first’s search starts at index 0 of str; if you specify start, the search will start at that index rather than at index 0. string last’s optional argument, end, lets you specify the ending index of the search, meaning that it will only look for substr between index 0 and the index specified by last.

substr.tcl in this chapter’s code directory illustrates how to use string first and string last. The example is short because it is incomplete. I’m going to build on it in the next two sections.

# Original sentence
set old "He was ?verbing? his wife's hair."

set start [string first "?" $old]
set end [string last "?" $old]
puts "start = $start"
puts "end = $end"

This script might serve as the start of a routine for performing a search-and-replace operation. The first step is to search for some text. The assumption in this example is that the text you want to replace is surrounded by ? characters. I use string first and string last to find the index position of the ? characters and then display those indices:

$ ./substr.tcl
start = 7
end = 15

Remember that index values are zero-based, so ? appears at positions 7 and 15, not 8 and 16 as you might expect. If you were writing a search-and-replace procedure, your next step would be to replace the “found” text with something new, which is precisely what the string replace command does.

The `range` Option

The range option returns a range of characters, that is, a substring, specified by start and end index values:

string range str start end

string range returns the substring that begins at position start and ends at position end from the string str.

If you’re thinking that the start and end arguments look an awful lot like the return values from string first and string last, you’d be spot on. In fact, this is a good example of how you’d use Tcl’s command nesting. range.tcl builds on substr.tcl from the previous section to extract a ?-delimited substring from another string:

# Original sentence
set old "He was ?verbing? his wife's hair."

# Get the starting and end points
set start [string first "?" $old]
set end [string last "?" $old]

# Extract the substring
set substr [string range $old $start $end]
puts "substring is $substr"

The output is what you’d expect, the ?-delimited substring:

$ ./range.tcl
substring is ?verbing?

If you want to use Tcl’s ability to nest commands, you could rewrite this script as shown in the following example (range_nested.tcl in this chapter’s code directory):

# Original sentence
set old "He was ?verbing? his wife's hair."

# Extract the substring
set substr [string range $old [string first "?" $old] [string last "?" $old]];
puts "substring is $substr";

The output is identical to the previous example. You can decide for yourself which model you prefer, the sequential method that limits nested commands (illustrated in range.tcl) or the more, um, “Tcl-ish” method that relies upon and takes advantage of command nesting (illustrated in range_nested.tcl). Tcl beginners find code written in the iterative or sequential mode easier to read, but using nested commands results in more idiomatic Tcl. Indeed, the more experienced you become with Tcl, you might find that using nested commands becomes a more natural way to write Tcl code.

The `replace` Option

The string replace command completes the search-and-replace set of commands you’ve been exploring in the last few sections. Its complete syntax is:

string replace str start end ?newstr?

This command removes the substring between and including the indices start and end from the string specified by str. If you include the optional argument newstr, the removed text will be replaced with the string specified by newstr. replace.tcl in this chapter’s code directory illustrates replacing text using string replace.

# Source sentence
set old "He was ?verbing? his wife's hair with a ?noun?."
puts "Old sentence:	$old"

# Find this
set verb "?verbing?"

# Replace with this
set newVerb "washing"

# Get the verb's starting and ending positions
set start [string first "?" $old]
set end [string first "?" $old [expr $start + 1]]

# Replace and display
puts "New sentence:	[string replace $old $start $end $newVerb]"

This script replaces the string ?verbing? with the string washing. Notice in the fourth block of code that I use string first twice. Why? Because string last returns the index of the last occurrence of the search string. Using string first with the optional start argument lets me reset the starting point of the search. The expression set start [string first "?" $old] found the index of the first ?. The nested expr command, [expr $start + 1], sets the starting point of the next search to the character that follows the first ?. This adjustment is necessary because the optional start argument for string first (remember, the syntax is string first substr str ?start?) begins the search at start. If I hadn’t incremented the starting index, the second string first command would have returned the position of the first ? instead of the second one.

The last command actually performs the replacement and displays the result. Here’s the output of this script:


$ ./replace.tcl
Old sentence:   He was ?verbing? his wife's hair with a ?noun?.
New sentence:   He was washing his wife's hair with a ?noun?.

I’ll leave replacing ?noun? with something else as an exercise for you. As a hint, you can simplify the code if you save the modified sentence produced in replace.tcl.

The `is` Option

The is option, that is, the string is command, enables you to test whether or not a given string belongs to a character class. A character class is a named group of characters that serves as a shorthand notation for the range operator, [charlist], introduced earlier in the chapter. For example, the character range for all lowercase characters is specified [a-z] using the range operator. The corresponding character class is lower.

In addition to serving as a shorthand notation, character classes are more general than sets specified using the range operator because character classes are defined over the Unicode character set. At this book’s beginning level, the fact that character classes are Unicode-aware won’t make a lot of difference. However, if you write a runaway hit game using Tcl and Tk and it gets translated to, say, Tamil, you’ll be happy to know that at least the code that uses character classes rather than hand-coded character ranges will work as intended and with no modifications.

The syntax for string is is:

string is class ?-strict? ?-failindex varname? str

class can be any of the classes listed in Table 4.2 and str is the string to test. If str is a member of class, string is returns 1; otherwise, it returns 0. The empty string, “”, is regarded as a member of all character classes unless you specify the -strict option, in which case the empty string is a member of no character class. If a string isn’t a member of a given character class, you can specify -failindex varname to have Tcl save the index at which str fails the comparison to the desired character class. Before you see an example, review the list of possible character classes, shown in Table 4.2.

Table 4.2. Tcl Character Classes

Class	Description
`alnum`	Any Unicode alphabetic character or digit.
`alpha`	Any Unicode alphabetic character.
`ascii`	Any character in the ASCII character set (7-bit characters).
`boolean`	Any of the forms used for Boolean values.
`control`	Any Unicode control character.
`digit`	Any Unicode digit.
`double`	Any of the forms used to represent double values.
`false`	Any of the forms used for Boolean values that evaluate to false.
`graph`	Any Unicode printing character, except a space.
`integer`	Any of the forms used to represent integer values.
`lower`	Any lowercase Unicode alphabetic character.
`print`	Any Unicode printing character, including space.
`space`	Any Unicode space character.
`true`	Any of the forms used for Boolean values that evaluate to true.
`upper`	Any uppercase Unicode alphabetic character.
`wordchar`	Any Unicode word character.
`xdigit`	Any hexadecimal digit character.

As you can see from this table, there’s a character class for almost every need you might have. A notable exception is octal digits (that is, digits in the base-8 number system). You can see the string is command at work in the following example, which tests the Japanese character for membership in each of the classes listed in Table 4.2:

proc TestClass {str class} {
        if {[string is $class $str]} {
                set msg "$str is in class '$class'"
        } else {
                set msg "$str is not in class '$class'"
        }
        puts $msg
}

set symbol "®"

TestClass $symbol alnum
TestClass $symbol alpha
TestClass $symbol ascii
TestClass $symbol boolean
TestClass $symbol control
TestClass $symbol digit
TestClass $symbol double
TestClass $symbol false
TestClass $symbol graph
TestClass $symbol integer
TestClass $symbol lower
TestClass $symbol print
TestClass $symbol space
TestClass $symbol true
TestClass $symbol upper
TestClass $symbol wordchar
TestClass $symbol xdigit

In is.tcl, I use a procedure named TestClass to perform the actual test, passing the procedure of the string I want to test and the character class name against which I want to test. Using the TestClass procedure makes writing the rest of the script a lot easier, because the balance of the script is a bunch of calls to TestClass for each class that interests me. The output of this script should resemble the following:

$ ./is.tcl
® is not in class 'alnum'
® is not in class 'alpha'
® is not in class 'ascii'
® is not in class 'boolean'
® is not in class 'control'
® is not in class 'digit'
® is not in class 'double'
® is not in class 'false'
® is in class 'graph'
® is not in class 'integer'
® is not in class 'lower'
® is in class 'print'
® is not in class 'space'
® is not in class 'true'
® is not in class 'upper'
® is not in class 'wordchar'
® is not in class 'xdigit'

As you can see, the character ® is a member of the graph and print classes and not a member of the others.

Modifying Strings

While it’s very interesting and even useful to know if a character is a member of a given character class or where in a string a substring appears, it’s even more useful to know how to slice, dice, and julienne strings.

Repeating Strings

The simplest string modification is likely repeating a string. Thus, we have the aptly named string repeat command:

string repeat str count

string repeat repeats the string str count times. It is much easier to write, for example:

puts [string repeat "*" 50]

than it is to write:

puts "**************************************************"

Both commands print 50 asterisks, but guess which one is easier to type?

Switching Case

Another frequently used operation is modifying the case of a string. Tcl’s string command supports three options for doing so: changing a string to all uppercase (using string toupper), changing a string to all lowercase (using string tolower), and changing a string to sentence case (using the inaccurately named string totitle). Each of the three options shares a common syntax:

string toupper str ?start? ?end?
string tolower str ?start? ?end?
string totitle str ?start? ?end?

In each case, the string specified by str will be returned with all characters modified appropriate to the option requested. By default, the entire string is modified; start and end (which are both integral values) specify alternative starting and stopping index values. If you specify start, the modification begins at that index; if you specify end, the modification stops at that index.

For example, given the deliberately perverse string “yOuR gUeSs MuSt Be BeTwEeN 1 aNd 20: “, case.tcl in this chapter’s code directory shows how toupper, tolower, and totitle modify it:

set str "yOuR gUeSs MuSt Be BeTwEeN 1 aNd 20: "

puts "toupper: [string toupper $str]"
puts "tolower: [string tolower $str]"
puts "totitle: [string totitle $str]"

When you execute the script, the output darn well better look like the following:

$ ./case.tcl
toupper: YOUR GUESS MUST BE BETWEEN 1 AND 20:
tolower: your guess must be between 1 and 20:
totitle: Your guess must be between 1 and 20:

Like I wrote, the totitle option seems misnamed because it doesn’t render what I consider “title case,” capitalizing the first letter of each word. Rather, it capitalizes the first letter of the target string and lowercases the rest. However, it’s named totitle so that’s what we have to use. You’re free to write your own ToTitle command if you want, of course.

Trimming Strings

Trimming strings refers to deleting unwanted characters from the beginning or end of strings. Tcl’s string trimming commands, string trimleft, string trimright, and string trim, are usually used to remove unwanted white space from the beginning or end of user input (those darn users will type anything!) The syntax of these commands is:

string trimleft str ?chars?
string trimright str ?chars?
string trim str ?chars?

str is the string to trim. By default, white space (spaces, tabs, newlines, and carriage returns) will be removed. If specified, chars defines a set of one or more characters that should be removed from str. As their names suggest, trimleft returns str with characters removed from the left end; trimright returns str with characters removed from the right end; and trim returns str with characters removed from the left and right ends. If str doesn’t contain any of the characters listed in chars, str will be returned unmolested.

Tip: String Operations are Nondestructive

String operations are nondestructive in that they do not modify their string arguments. All of the string operations discussed in this chapter return a new string that reflects the changes performed; the original or source string is left alone. This feature is a direct result of Tcl’s programming model (grouping and command substitution) and enables you to use the results of string operations without worrying about your source data being modified in some inscrutable fashion. It also means that you must explicitly use the set command to assign the results of string operations to variables if you want to keep those results for later use.

Trimming strings is uncomplicated, so I won’t discuss it further here. Nevertheless, the script trim.tcl in this chapter’s code directory demonstrates the usage of all three string-trimming options.

Appending Strings

Up to now, if you wanted to add text to a string variable, you would use the set command:

set label "Player Name: "
set label "$label Kurt Wall"
puts $label

This approach is functional, but is not the most efficient way to build up a long variable. The easy, efficient way is to use the append command. For example, the previous two set commands are equivalent to the following command:

append label "Player Name:" "Kurt Wall"

append’s syntax is:

append var value ?...?

append tacks each value on to the end of the variable specified by var. If var doesn’t exist, its value will be the concatenation of each value specified. Unlike the various string commands discussed in this chapter, append modifies the value of var. It also returns the modified string. The reason that append is more efficient than multiple set commands is that append uses Tcl’s internal memory manager to extend the variable being assigned, whereas set takes a more roundabout approach. I’ll prove this to you in the next section when you learn how to use the for command to write an iterative loop.

Looping Commands

In the previous chapter, I introduced the notion of control structures, which allow you to write scripts that do more than execute sequentially from the first to the last line of the script. In particular, I showed you how to use the conditional execution command, if. In addition to conditional execution, Tcl also supports a number of commands for looping, or executing the same command or set of commands multiple times. I’ll cover two of them in this chapter, while and for. The while command creates a loop that executes as long as, or while, a Boolean test expression evaluates to true. When the test expression evaluates to false, control exits the loop and continues with the command immediately following the while command. The for command creates an iterative loop, that is, a loop that executes a fixed number of times and then terminates (again, with control passing to the command immediately following the for command).

Looping with the `while` Command

Loops that use while are sometimes referred to as indeterminate loops because you don’t know how many times they will execute, only that they will (hopefully) eventually terminate when their test condition evaluates to false. The syntax of the while command is:

while {test} {body}

test is a Boolean expression (an expression that has a Boolean result). When the loop starts, test is evaluated; if it is true, the command or commands in body execute. Otherwise, body is skipped and execution resumes with the command immediately following the while command. After each pass through body, test is re-evaluated. If test is still true, body will execute; otherwise, the loop terminates and execution resumes with the command immediately following the while command.

Strictly speaking, the braces I used in the syntax diagram aren’t required. However, test will almost always need to be enclosed in braces because you need to protect its condition from premature substitution. If you don’t use braces, the likely result is either an infinite loop (a loop that never terminates) or a loop that never executes at all. The braces are usually necessary because, without them, Tcl interpreter will substitute the value in the test condition before the while command evaluates it. Using braces around the test condition prevents premature substitution. I suggest enclosing the body of the while loop in braces as well. Until you are much more confident of your ability to predict how substitution and grouping will behave, enclosing the body command(s) in braces will result in fewer surprises and unpleasant side effects.

The following script, while.tcl in this chapter’s code directory, offers a useful illustration of how while loops work:

set lineCnt 0
set charCnt 0

while {[gets stdin line] >= 0} {
        incr lineCnt
        incr charCnt [string length $line]
}
puts "Read $lineCnt lines"
puts "Read $charCnt characters"

This simple script reads input typed at the keyboard (or redirected from another file). Each time it encounters a newline, it increments the variable lineCnt by 1 and the variable charCnt by the number of characters in the line. When it encounters EOF (end-of-file), it drops out of the loop and displays the number of lines and number of characters read.

$ ./while.tcl < while.tcl
Read 13 lines
Read 229 characters

Recall that gets returns -1 when it reads EOF—that means that the test condition [gets std line] >= 0 will return evaluate to true as long as gets receives valid input. When gets sees EOF in the input stream, the test condition evaluates to false and the loop terminates.

Note: Newlines Don’t Count

The Linux- or UNIX-using reader (and the obsessive-compulsive reader who counts everything) will notice that while.tcl actually has 242 characters:

$ wc -c while.tcl
242 while.tcl

So, why does while.tcl say that it only has 229? Because gets discards the newline. Accordingly, if you think that newlines should also be counted, change the last line of while.tcl to the following:

puts "Read [expr $charCnt + $lineCnt] characters"

I cheated by introducing a command you haven’t seen yet, incr. Briefly, incr increments (hence the name) the value of a variable. incr’s virtue is that it is easier to write than set someVar [expr someVar + someValue]. incr’s syntax is simple:

incr var ?unit?

By default, incr increments var, which must be an integer variable, by 1. If you specify unit, which must also be an integer value (or an expression that evaluates to an integer value, as in while.tcl), unit will be added to var. Yes, unit can be a negative integer, which would have the effect of decrementing var. No, there isn’t a separate command decr used to decrement a variable, although you could certainly write one if you have a rage for order and symmetry.

Iterative Loops: The `for` Command

The for command enables you to execute one or more commands a fixed number of times, or iterations. Hence, for loops are often referred to as iterative loops. Its syntax is:

for {start} {test} {next} {body}

Again, the braces shown in the syntax diagram aren’t required, but I recommend using them to preserve your sanity. start is an expression that initializes a loop counter, the variable that controls how many times the loop executes. test is a Boolean expression that controls whether or not the command(s) in body will be executed by testing the loop counter against the terminating condition, the value at which the loop exits. next is an expression that increments the loop counter.

When a for loop starts, the expression in start is executed, which sets the initial value of the loop counter. Then the expression in test is evaluated. test usually includes the loop counter, but it doesn’t have to. If test evaluates to false, the for loop will be skipped, and execution resumes with the command immediately following the for command. Otherwise, the command(s) in body will be executed. The next expression is evaluated after the last command in body. next increments or decrements or otherwise modifies the loop counter so that the for loop eventually terminates. After the next expression is executed, the test condition is evaluated. If test evaluates to false, the loop terminates and control passes to the command immediately following the for command. If test evaluates to true, body will be executed, followed by the next expression. Wash. Rinse. Repeat.

Confused? The following script (for.tcl in this chapter’s code directory) should help:

for {set i 1} {$i <= 10} {incr i} {
       puts "Loop counter: $i"
}

This script increments the value of a loop counter variable, i, and displays that value. In terms of the syntax diagram I showed at the beginning of this section:

The start condition is set i 1.
The test condition is $i <= 10.
The next expression is incr i, which increments the value of i by 1 on each pass through the loop.
The body command is puts "Loop counter: $i".

The body of the loop executes for each value of i that is less than or equal to 10. The runtime behavior should be unsurprising:

$ ./for.tcl
Loop counter: 1
Loop counter: 2
Loop counter: 3
Loop counter: 4
Loop counter: 5
Loop counter: 6
Loop counter: 7
Loop counter: 8
Loop counter: 9
Loop counter: 10

You’ll use for loops quite a bit in your scripts because for is an easy, natural way to create loops that need to execute a fixed number of times. You’ll learn yet another looping construct, foreach, in the next chapter.

Comparing set and append

The following script (test_append.tcl in this chapter’s code directory) compares the relative performance of the set and append commands (and gives you another example of using the forcommand to create an iterative loop). The testing methodology is primitive but illustrative:

Save a timestamp in millisecond (1000^th of a second) units.
Execute thousands of set or append commands in a for loop.
Save a second timestamp.
The difference between the two timestamps represents the time spent executing all of the set or append commands.

# Counter
set cnt 100000

# Doing it the hard, ineffecient way
set var1 0
set start [clock clicks -milliseconds]
for {set i 1} {$i <= $cnt} {incr i} {
set var1 "$var1,$i"

}
set stop [clock clicks -milliseconds]
puts "Elapsed time using set: [expr ($stop - $start) / 1000.0] secs"

# Doing it the easy, efficient way
set var2 0
set start [clock clicks -milliseconds]
for {set i 1} {$i <= $cnt} {incr i} {
       append var2 "," $i
}
set stop [clock clicks -milliseconds]
puts "Elapsed time using append: [expr ($stop - $start) / 1000.0] secs"

As you can see in the following table, the runtime performance of set and append differs dramatically:

Iterations	`set` Runtime	`append` Runtime
100,000	47.884 sec	0.354 sec
200,000	217.359 sec	20.672 sec
500,000	1508.435 sec	1.665 sec
1,000,000	6531.524 sec	3.292 sec

Naturally, the timing results will vary from system to system and my simple test might not reflect real-world usage. Indeed, “real-world” code will probably be more involved than my simple loop bodies. The primary point is the difference in performance, and I think the results speak for themselves. The moral of this story? Don’t use set if you will be doing heavy-duty variable building. append is far more efficient.

Analyzing Mad Libs

As you’ll see in the “Looking at the Code” section, mad_lib.tcl doesn’t use all of the commands you learned in this chapter. It does illustrate key commands and gives you a fertile base for further experimentation.

Looking at the Code

#!/usr/bin/tclsh
# mad_lib.tcl
# Demonstrate string manipulation

# Block 1
# The source sentence
set line "One day while I was ?verb ending in -ing? in my living room, "
append line "a ?adjective? ?mythical creature? fell through the roof. "
append line "It jumped on the ?piece of furniture? and knocked over the "
append line "?noun?. Then it ran into the dining room and ?past tense verb? "
append line "a ?noun?. After ?number? minutes of chasing it through the "
append line "house I finally caught it and put it outside. It quickly "
append line "flew away."

# Block 2
while {[string first "?" $line] != -1} {
    # Block 2a
    # Find the next ??-enclosed word or phrase
    set start [string first "?" $line]    set end [string first "?" $line [expr
$start + 1]]

    # Block 2b
    # Extract the text between the ??
    set prompt [string range $line [expr $start + 1] [expr $end–1]]

    # Block 2c
    # Display the prompt and get the user's input
    puts -nonewline "Enter a $prompt: "    flush stdout    gets stdin input

    # Block 2d
    # Update the sentence
    set line [string replace $line $start $end $input]
}

# Block 3
# Print the completed mad lib
puts $line

Understanding the Code

The code in Block 1 just sets up the sentence that the rest of the script will be modifying. What I’ve done is delimit text I want to replace with ? characters. This makes it easy to find the text and replace it with the input provided by the user. The other salient point in this block of code is that I’m following my own advice and using append rather than set to build up the string. Block 3 is nothing new; it just displays the completed mad lib.

Block 2, which I’ve subdivided into Blocks 2a through 2d, is where the real work gets done. The test in the while loop provides the terminating condition. Recall that string first returns -1 if it doesn’t find a specified substring. In this case, once I’ve replaced all the ?-delimited text, there will be no more ? characters for string first "?" $line to find, so the command will return -1, the test condition will evaluate to false, and control will drop out of the loop and display the completed silly sentences.

The first step is to find text enclosed in the delimiters, which is handled by Block 2a. I use the same string first technique that I described in replace.tcl earlier in the chapter. Once I’ve found the starting and ending points, which needs to include the delimiters, I save them in the aptly named start and end variables because I’m going to need these values several times.

Block 2b extracts the text, without the ? delimiters, which gives me a ready-made prompt to display to the user. To drop the leading ?, I increment one character into the substring. Similarly, to get rid of the trailing ?, I decrement the ending index value by one character. As I’ve suggested before, Tcl’s ability, even affinity for, nested commands makes this kind of operation easy to express in code, albeit potentially hard to read for Tcl neophytes. However, once you’ve become familiar with this particular idiom, it will become a natural way to write code.

Block 2c uses the prompt extracted in Block 2b to ask the user to enter a particular word or phrase. The technique for reading user input is the same one I introduced in the previous chapter, so it should look familiar. Whatever the user types gets stored in the variable named input.

In Block 2d, finally, I use the string replace command to replace the ?-delimited text with the word the user typed. At this point, control returns to the top of the while loop, the test condition is evaluated again, and, if it’s true, control reenters the loop body. If the test condition is false, control passes to Block 3, and I reveal the completed silly story.

Modifying the Code

Here are some exercises you can try to practice what you learned in this chapter.

4.1 Modify mad_lib.tcl to use a different delimiter in the source string so that the source string can include ? characters.
4.2 Modify Block 2b of mad_lib.tcl to use another method to extract the prompt. Hint: All you’re really doing is stripping off leading and trailing characters.
4.3 Modify Block 3 of mad_lib.tcl to format the output such that words don’t break across lines. That is, make the printed mad lib fit into lines of approximately 75 characters.

Working with strings is an essential component of most Tcl programs, and Tcl is well-equipped for dealing with strings. In fact, Tcl has such a rich set of commands for dealing with strings that you might not be sure which one to use in a given situation. You can compare strings for equality and for membership in a certain character class. You can also find out how long strings are. Tcl also allows you to find where in a string a certain character or substring of characters is located and, if you need to do so, Tcl even has a command for replacing one substring with another. Miscellaneous functions, such as removing unwanted characters from the ends of a string and changing a string’s case, round out the basic string functionality. I’ll introduce additional string-handling capabilities in later chapters, but first, you’re going to learn about another Tcl strong point, lists.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Strings, Strings, Everywhere Strings!

Create new playlist

Sign In

Sign Up

Chapter 4. Strings, Strings, Everywhere Strings!

Mad Libs

The string Command

Comparing Strings

The compare Option

The equal Option

The match Option

Tip: Matching the match Characters