Chapter 8. Accessing Files and Directories

Most non-trivial programs involve interacting with the host filesystem. In this chapter, you’ll learn how to open, close, delete, and rename files. The chapter also shows you how to perform file I/O using the puts (output) and gets (input) commands and how to use the format command to “pretty print” output. Finally, you’ll learn how to navigate the filesystem programmatically and work with file and directory names in a platform-neutral manner.

Word Search

This chapter’s game, word_search.tcl in the code directory, is a simplified version of the classic bus and plane game. It shows you a grid of space-delimited litters. Each row of letters contains an embedded word that you have to find. Each row has one word oriented left to right; there are no words (at least deliberately) oriented on vertical or diagonal axes. As a hint, the words you have to find are commands used or introduced in this chapter. You start the game executing the script. Review the game grid and when you find a word in one of the rows, type the row number, press Enter, and then type the word you found and press Enter. After the script evaluates your input, it shows you the result and asks if you want to play again. To keep the screen tidy, I use the hoary UNIX command tput clear; on Windows, you will probably have to use the old DOS command cls unless you are using a UNIX emulator like Cygwin. Here are a few iterations of word_search.tcl:

$ ./word_search.tcl
1   e o p e n u g r i v c
2   n l v n j c l o s e d
3   j b p u t s s z m h i
4   s q n i d g g e t s t
5   h e r r e a d e r s e
6   z o t z g v a n e r s
7   f o r m a t a l b m c
8   d h n p s e e k p g e
9   a m a j y r a t e l l
Select a line (1–9): 2
What word do you see: closed
player: 'closed' puzzle: 'closed'
Correct!
Play again (Y/n)?  y
1   e o p e n u g r i v c
2   n l v n j c l o s e d
3   j b p u t s s z m h i
4   s q n i d g g e t s t
5   h e r r e a d e r s e
6   z o t z g v a n e r s
7   f o r m a t a l b m c
8   d h n p s e e k p g e
9   a m a j y r a t e l l
Select a line (1–9): 7
What word do you see: mat
Sorry.
Try again (Y/n)? y
1   e o p e n u g r i v c
2   n l v n j c l o s e d
3   j b p u t s s z m h i
4   s q n i d g g e t s t
5   h e r r e a d e r s e
6   z o t z g v a n e r s
7   f o r m a t a l b m c
8   d h n p s e e k p g e
9   a m a j y r a t e l l
Select a line (1–9): 1
What word do you see: open
Correct!
Play again (Y/n)? n

You’ve already seen and used many of the commands used in word_search.tcl. The file handling commands are new, though, as are some of the ways I’ve combined the commands. The balance of the chapter will fill in the gaps.

Opening and Closing Files

Before you can do much else with a file, you have to open it. When you’re done with a file, it is good practice, but not necessarily required, to close it. The syntax for opening a file is:

open name ?access? ?perms?

name identifies the name of the file to open. If specified, access defines the type of file access you want (see Table 8.1). Similarly, if perms is specified, it defines the UNIX-style file permissions to set on newly created files.

open returns a channel ID, a unique identifier or handle used to refer to the file in subsequent operations on it. Although you might not have realized it, you’ve already used channel IDs with the puts and gets commands. Recall that puts writes to stdout by default (that is, puts "foo" and puts stdout "foo" are identical commands). stdout is a channel ID. Similarly, when you use gets to read keyboard input, you have to write gets stdin. stdin is another channel ID.

The access argument indicates whether you want to read a file, write to a file, read and write a file, or append to a file. If not specified, Tcl assumes you merely want to read the file. Table 8.1 lists the possible values for access.

Table 8.1. File Access Modes

Argument

Mode

Description

r

Read-only

Open for output: name must exist.

r+

Read/write

Open for input and output: name must exist.

w

Write-only

Open for output: If name exists, truncate it; otherwise create it.

w+

Read/Write

Open for input and output; If name exists, truncate it; otherise create it.

a

Append

Open for output, appending data to name; create name if it doesn’t exist.

a+

Read/Write

Open for input or output, appending data to name; create name if it doesn’t exist.

File permissions control who can do what to a file. I’m going to skip a tedious, detailed excursus on UNIX file permissions. Any good UNIX or Linux reference (and most bad ones, too) can get you up to speed on UNIX-style file permissions. What you most need to know is that unless otherwise specified, open commands that result in creating files apply a default mode of 0666, which means that they are readable and writable by everyone. As a matter of habit and security, I prefer to create files with mode 0644, which means that I can read and write them, but everyone else can only read them. If you are extremely paranoid, you can use a mode of 0600, which means that you can read and write the file but no one else can.

To close a file, the syntax is quite simple:

close id

id must be a channel ID returned by a previous open (or socket) command. One would think that closing a file is a simple operation, but the reality is slightly more complicated. When you issue the close command, several tasks occur before the file is really, truly closed:

  • Any buffered output is flushed to disk.

  • Any buffered input is discarded.

  • The associated disk file or I/O device is closed.

  • The channel ID is invalidated and cannot be used for subsequence I/O operations.

Although not strictly necessary, I vigorously encourage you to close files explicitly. When your script exits, any files that you opened will be closed. In the nominal case, this is fine. However, long-running scripts or scripts that open lots of files might use up operating system resources (on UNIX and UNIX-like systems, for example, file descriptors are a finite resource), so get into the habit of closing your files.

The following short script (open.tcl in this chapter’s code directory) illustrates opening and closing a text file. The text file, sonnet20.txt, is Shakespeare’s Sonnet XX and is also included in this chapter’s code directory:

set fileId [open sonnet20.txt r]
puts "opened 'sonnet20.txt' with channel ID '$fileId'"
close $fileId

if {[catch {set fileId [open sonnet21.txt r+]} err]} {
    puts "open failed: $err"
    return 1
} else {
    puts "opened 'sonnet21.txt' with channel ID '$fileId'"
    close $fileId
}

Here’s what the output should look like when you execute this script:

$ ./open.tcl
opened 'sonnet20.txt' with channel ID 'file5'
open failed: couldn't open "sonnet21.txt": no such file or directory

The first block of code opens the file sonnet20.txt in read-only mode, storing the returned ID in the $fileId variable. After opening the file, I promptly close it.

The second block of code attempts to open sonnet21.txt in read-write mode. However, because this is a cooked-up example and I knew that sonnet21.txt didn’t exist, I embedded the open command in a catch statement to illustrate how to handle file access errors. If open fails for some reason, it raises an error. In the absence of the catch command, you’d see the standard, ugly Tcl stack trace followed by an abrupt, graceless exit. My error handler is only slightly more graceful and attractive, but the point I want to emphasize is that in real-world code, you need to code defensively and try to anticipate possible or common errors (such as files not existing).

Note: I Don’t Follow My Own Advice

Note: I Don’t Follow My Own Advice

The code samples in this book set a bad example. For clarity and brevity, most of the scripts in this book don’t include error handling. Do as I say, not as I do!

If you review Table 8.1, you’ll see that most of open’s access modes will create files if they don’t exist (w, w+, a, and a+). The key difference is that write operations (w and w+) will truncate a file that already exists (provided you have write permissions for the file), whereas append operations (a and a+) don’t truncate an existing file. Rather, when you append to an existing file, it is opened for writing, and the file pointer is positioned at the end of the file so that write operations don’t overwrite existing data. The following scripts, trunc.tcl and append.tcl in this chapter’s code directory, illustrate the difference. trunc.tcl opens an existing file, junk, for writing, and then closes it:

set fileId [open junk w]
puts "opened 'junk' with channel ID '$fileId'"
close $fileId

Before you run this script, make sure a file named junk exists in the directory from which you execute the script. On Linux, UNIX, and Mac OS X, you could execute the command touch junk to create an empty, zero-length file named junk.

Using ls -l before and after running the script, you can see what happens:

$ touch junk
$ ls -l junk
-rw-r--r-- 1 kwall kwall 110622 2007-08-03 01:39 junk
$ ./trunc.tcl
opened 'junk' with channel ID 'file5'
$ ls -l junk
-rw-r--r-- 1 kwall kwall 0 2007-08-03 01:42 junk

You’ll need a file named “junk” for this script to work and, naturally, the output of the ls commands will be different.

append.tcl, on the other hand, opens junk in append mode, which preserves its contents:

set fileId [open junk a]
puts "opened 'junk' with channel ID '$fileId'"
puts $fileId [info vars]
close $fileId

As you can see from the following commands, appending a file for appending leaves its existing contents remain intact and adds new data to the end of the file:

$ date > junk
$ cat junk
Sat Sep  1 20:04:05 EDT 2007
$ ./append.tcl
opened 'junk' with channel ID 'file5'
$ cat junk
Sat Sep  1 20:04:05 EDT 2007

tcl_rcFileName tcl_version argv0 argv tcl_interactive fileId errorCode auto_path error-
Info env tcl_pkgPath tcl_patchLevel argc tcl_libPath tcl_library tcl_platform

First, I redirect the output of the date command to the file named junk and then cat junk’s contents. After I execute the append.tcl script, I cat junk a second time to show the the data I added was put at the end of the file and that its existing contents untouched.

Caution: R Means W and W Means R, Sometimes

Caution: R Means W and W Means R, Sometimes

In an unfortunate bit of perversity, the access modes r+ and w+ open a file for both reading and writing. In the case of r+, the file must exist. If you specify the w+ mode, the file will be created if it doesn’t exist. The perversity to which I refer is not that there are two modes that do (almost) the same thing, but that the “r” in r+ mnemonically suggests reading, not reading and writing. Likewise, the “w” in w+ mnemonically suggest writing, not writing and reading.

The moral of this story is to be careful when opening files for writing. If you need to preserve the existing data, use the a or a+ mode and append data. If you don’t care about the existing contents, use w or w+ as the situation requires.

Reading Files

I’m going to go out on a limb here and guess that you want to do more than just open and close files. Reading and writing them will probably be helpful. Fair enough. You have at least three options for reading a file for input: the gets command, which you’ve already seen; the read command; and the scan command. Which one should you use? Here are three rules of thumb:

  1. Use gets to read and process one line of input at a time.

  2. Use read if you want to read blocks of input without regard to end-of-line markers.

  3. Use scan to read formatted input.

The following subsections cover the specifics of using each of these three input commands.

Using gets for File Input

So far in this book, you used gets to read input from stdin (the keyboard), using a command such as one of the following:

set line [gets stdin]
get stdin line

The first gets command reads input from stdin, discards the terminating newline, and returns the fetched line, which the set command stores in the variable $line. If a blank line had been read, gets would have returned the empty string, which would have been stored in $line. To differentiate between a blank line and the end-of-file, you have to use the EOF command on the I/O channel (stdin in this case). If EOF returns 1, end-of-file has been reached; if EOF returns 0, gets has not reached end-of-file.

The second gets command also reads input from stdin but, in this case, stores the input in the variable $line itself after discarding the trailing newline. In this form, gets returns the number of characters it read (not counting the newline). If it reads a blank line, then gets returns 0. If it encounters the end-of-file, gets returns -1. For file I/O, I think this form of gets is easiest to use because it automatically detects end-of-file, saving you from having to check for end-of-file conditions with an EOF call. This is the form of the command I’ll use for the rest of the book when dealing with input from a file. For keyboard input, I’ll continue to use the first form of the gets command.

Applying what you learned in the previous section about I/O channels, if you replace stdin with a channel ID returned by the open command, you can read from a file. The following script, gets.tcl in this chapter’s code directory, demonstrates opening and reading a file:

set fileId [open sonnet20.txt r]
set totalChars 0
set totalLines 0

while {[set cnt [gets $fileId line]] != -1} {
    puts "($cnt chars) $line"
    incr totalChars $cnt
    incr totalLines
}
puts "read $totalChars chars"
puts "read $totalLines lines"

close $fileId

The first command opens sonnet20.txt for reading. The next two commands set a couple of counter variables I use while reading the input file. The most complicated part of the script is the while loop. In English, it simply means, “Read a line of input from the file, store the input text in the variable named line and the number of characters read in cnt. Keep doing this until you encounter end-of-file.” Inside the while loop, for each line read, I display the number of characters read (not counting the terminating newline) and the text of the line; then I increment the number of characters read (incr totalChars $cnt) and the number of lines read. When gets hits the end-of-file, control drops out of the while loop, at which point I display the total number of characters and lines read, close the input file, and exit the program.

If you execute gets.tcl, the output should look like the following:

$ ./gets.tcl
(2 chars) XX
(0 chars)
(46 chars) A woman's face with nature's own hand painted,
(45 chars) Hast thou, the master mistress of my passion;
(42 chars) A woman's gentle heart, but not acquainted
(50 chars) With shifting change, as is false women's fashion:
(54 chars) An eye more bright than theirs, less false in rolling,
(39 chars) Gilding the object whereupon it gazeth;
(43 chars) A man in hue all 'hues' in his controlling,
(50 chars) Which steals men's eyes and women's souls amazeth.
(40 chars) And for a woman wert thou first created;
(48 chars) Till Nature, as she wrought thee, fell a-doting,
(36 chars) And by addition me of thee defeated,
(42 chars) By adding one thing to my purpose nothing.
(54 chars)   But since she prick'd thee out for women's pleasure,
(53 chars)   Mine be thy love and thy love's use their treasure.
(0 chars)
read 644 chars
read 17 lines

Using read for File Input

If you don’t want or need to read and process an input file line-by-line, you can use the read command, which reads a specific number of characters or the entire file. read’s syntax is:

read ?-nonewline? id
read id numChars

id is the file to read and numChars, if present, indicates how many characters to read from id. read’s return value is the data read from id. In the first form of the command, read reads the entire file and, if -nonewline is specified, discards the last character of the file if it is a newline. In the second form of the command, read reads exactly numChars characters, unless it encounters EOF before reading the specified number of characters. In the latter case, read returns the data it was able to read.

Before explaining why you might want to use read instead of gets, have a look at the following script, read.tcl in this chapter’s code directory. The source file, wssnt10.txt, is the complete text of Shakespeare’s sonnets, courtesy of Project Gutenberg (http://www.gutenberg.org/etext/1041), and is also included in the code directory:

# Read the entire file
set fileId [open wssnt10.txt r]
set input [read $fileId]
puts "Read [string length $input] characters"
close $fileId

# Read the file 1024 characters at a time
set fileId [open wssnt10.txt r]
while {![eof $fileId]} {
      set input [read $fileId 1024]
      puts "Read [string length $input] characters"
}
close $fileId

In the first block of code, I read the entire file and then closed it. The second block of code reopens the file, reads it in 1024-character blocks, and then closes it. First, I have to close the input file explicitly and then reopen it before trying to read it a second time. Why? After the first read command completes, the file pointer is positioned at the end of the file. Accordingly, the next read or gets command has nothing to read. Closing and reopening the file resets the file pointer to the beginning of the file. As it happens, there’s a smarter way to move the file pointer, the seek command, which you’ll meet in the section, “Moving the File Pointer: Random Access I/O,” later in this chapter.

Second, notice that the while condition uses the EOF command to test for an end-of-file condition on $fileId. Unlike the gets command, read does not return a special value (referred to as a sentinel value, or just a sentinel) to indicate it’s at the end of the file. In fact, in the absence of the EOF command, read would happily continue to “read” the file, it just wouldn’t return anything, so the script would be stuck in an infinite loop.

When you execute this script, the output should look like the following. I’ll only show the first and last three lines of the output here to preserve space:

$ ./read.tcl
read 107701 characters
read 1024 characters
read 1024 characters
...
read 1024 characters
read 1024 characters
read 181 characters

Hardly riveting output, but the last line bears discussion. Although the read command requested 1024 characters, there were only 181 left in the input file, so read returned what was available.

Why use read instead of gets? Suitability is one reason, but the primary reason is efficiency. In this context, suitability just means that the task you are trying to perform might not require processing a file line-by-line or that the data itself isn’t appropriate for line-by-line input. For example, a binary file can contain embedded newline characters that aren’t used for line breaks per se. In such a case, read is the right command to use.

Although reading and processing input line-by-line with gets is convenient and easy, it is inefficient for large files because multiple small disk read operations are much slower than a single large read that takes advantage of the disk’s read-ahead functionality. How inefficient? Consider Table 8.2. It shows the time required to read a 1GB text file using gets, using read to slurp up the entire file in one large read, and using read with various block sizes.

Table 8.2. I/O Times for gets and reads on a 1GB File

Command

Read Size (chars)

Elapsed Time (secs)

MB/sec

gets

N/A

65.9

15.5

read

N/A

25.9

39.5

read

64

68.4

15.0

read

128

37.4

27.4

read

256

22.4

45.7

read

512

14.7

69.6

read

1024

10.9

93.3

read

2048

9.2

111.8

read

4096

8.8

116.9

read

8192

8.3

122.8

read

16384

8.3

123.0

read

32768

8.4

121.7

As you can see in Table 8.2, read is much more efficient than gets. If you want to try this experiment yourself, create a 1GB file named bigfile and execute the script readtest.tcl in the readtest subdirectory of this chapter’s code directory. Of course, the performance you see will be different on your system.

Note: This Is Not a Rigorous Benchmark!

Note: This Is Not a Rigorous Benchmark!

The I/O speeds reported by readtest.tcl are relative. The results are influenced by CPU speed, available memory, the other processes running on the system, the type and speed of your hard disk, the amount of on-disk cache, the filesystem type, the phase of the moon, and what you ate for lunch today. Use readtest.tcl to gain insight into the performance of gets and read, not to establish whether your computer is an I/O speed machine or a boat anchor.

Writing Files

Now that you’ve seen how to get data in to your program, I’ll show you how to get data out of it. The two workhorse Tcl commands for output are puts, which you’ve already seen and used a good deal, and format. puts is great if you don’t care about how the output looks, don’t have any requirements for precisely formatted output, or if you are in a hurry. The format command is the tool to use if you do care how the output looks, do have requirements for carefully formatted output, and can take a little bit longer to write your script (but only a little bit longer).

Using puts for Output

As explained and shown in previous chapters, puts writes data to an output channel. So far, the output “channels” have been the screen, specifically, standard output and standard error (stdout and stderr, respectively) and, as you saw earlier in this chapter, disk files. In the general case, though, a channel is any stream capable of receiving output. So, in addition to stdout, stderr, and file IDs returned by the open command, puts can also write a network socket created by the socket command (I don’t discuss network I/O in this book) or an output medium created by a Tcl extension. For example, you can use puts to send data to a printer or to a serial device (such as a mouse or a modem) if you have an output channel that has been set up for such a purpose.

To simplify the presentation, I’ve glossed over some of puts’ subtleties because they are fine points that would obscure the point I am trying to make. For example, Tcl buffers output, so data you want to print using puts won’t appear until the buffer is full or the buffer is specifically flushed (using the flush command). Buffering is handled by the underlying operating system, although you can modify buffering behavior using special-purpose Tcl commands.

Another issue I haven’t addressed is how puts handles newlines. For better or worse, each of the major operating systems uses different end-of-line (EOL) sequences differently. Linux, UNIX, Macintosh OS X, and related systems use a linefeed character ( ) to indicate EOL; Macintosh systems before OS X use a carriage return ( ); and Microsoft Windows (and MS-DOS and OS/2) use a carriage return followed by a linefeed ( ). In large part, you don’t have to concern yourself with this because Tcl handles the EOL translations for you automatically, converting EOLs to the character sequence appropriate for the host operating system. However, you can modify this behavior using the fconfigure command. Again, this is an advanced topic I won’t cover in this book.

The point to take away from this discussion is that Tcl and puts by and large do the right thing with respect to output. If you find you need greater control, the capability is there. In the meantime, you can use puts for output and be blissfully ignorant of its under-the-covers details.

Formatting Output with format

If you have ever written C, chances are very good that you have used C’s printf() function to print formatted output. Tcl’s format command is much like printf(). The biggest difference is that format doesn’t print the string it formats, it just returns the formatted string. Printing the formatted string is handled with the puts command. format’s syntax is:

format spec ?val ...?

format formats one or more values, specified by val in the syntax diagram, according to the format specification defined by spec. The format specification can consist of up to six parts:

  • A position specifier

  • Zero or more flags

  • A field width

  • A precision specifier

  • A word length specifier

  • A conversion character

I’m going to focus on the items on the flags: field width, precision specifier, and conversion character. The position and word-length specifiers are less commonly used and are used in situations this book won’t cover. Each argument of the format specifier begins with a percent sign, %, followed by zero or more modifiers, and ends with a conversion character.

Conversion characters indicate how to print, or convert, the corresponding argument in the value list. Although conversion characters appear last in the format specification, I cover them first so you’ll know what you’re trying to format. Table 8.3 lists the most frequently used conversion characters.

Table 8.3. Common Format Conversion Characters

Character

Description

c

Displays an integer as the ASCII character it represents.

d

Signed integer.

f

Floating point value in m.n format.

s

String.

u

Unsigned integer.

X

Unsigned hex value in uppercase format.

x

Unsigned hex value in lowercase format.

A complete list of conversion characters is available in the format man page (man 3tcl format). For example, to format a string, you would use the command, format "%s" "string to format". The command format "%d:%x" int_val hex_val would format a signed integer, followed by a literal colon, followed by a lowercase hexadecimal value. Although not specifically necessary, I use double quotes around the format specifier as a matter of habit. If the format specifier or the value to format contains embedded spaces, the quotes would be necessary.

Flags are modifiers used to specify padding and justification of the formatted output. Table 8.4 lists the valid flags.

Table 8.4. Valid Format Flags

Flag

Description

-

Left-justify the field.

+

Right-justify the field.

0

Pad with zeros.

#

Print hex numbers with a leading 0x, octal numbers with a leading 0.

space

Precede a number with a space unless a sign is specified.

After the flags, you can specify a minimum field width and an optional precision value. For example, to format the floating point value 1.98, you could use any of the following commands (see format.tcl in this chapter’s code directory):

puts [format "%f" 1.98]
puts [format "%5f" 1.98]
puts [format "%5.2f" 1.98]

The first command uses the default floating point formatting (%f). The second command uses a field width of 5 (%5f). The third command uses the same field width and adds a precision specifier (%5.2f). These commands correspond to the following output:

1.980000
  1.980000
 1.98

On my OS X system, the second line of output was not indented as it should have been. This is known as A Bug. Most of the example scripts in this chapter use format commands, so you can refer to these scripts for more examples of using the format command. The man page has complete details.

Moving the File Pointer: Random Access I/O

Earlier in this chapter, I noted that a read operation advances the file pointer through an I/O channel. In an example, I closed and reopened the input file to reposition the file pointer at the beginning of the file. While this type of sequential I/O is a common operation, you often want or need to read from arbitrary file locations or need to be able to reposition the file pointer without closing and reopening the file. Tcl’s seek and tell commands provide this ability, which is referred to as random access I/O.

As an I/O operation proceeds, the file pointer’s current position in the file, known as the seek offset, can be determined by using the tell command. tell’s syntax is:

tell channelID

tell returns an integer string that indicates the current seek offset. If the specified I/O channel does not support seeking (process pipelines, for example, do not support seeking), tell returns -1.

To move the file pointer (change the seek offset), use the aptly-named seek command. Its syntax is:

seek channelID offset ?origin?

This command moves the file pointer offset bytes forward or backward relative to origin in the file referred to by channelID. origin must be one of start, end, or current and defaults to start if not specified. offset can be negative or positive. It is an error to seek backward (using a negative offset) from the beginning of a file but not to seek forward from the end of a file.

The following script, randread.tcl in this chapter’s code directory, shows how you might use the seek and tell commands:

set fileId [open wssnt10.txt r]

seek $fileId 10 start
set input [read $fileId 10]
puts "Text between bytes 10 and 20: =>$input<="
puts "File pointer at byte: [tell $fileId]"

seek $fileId -25 end
set input [read $fileId 25]
puts "Last 25 characters: =>$input<="
puts "File pointer at byte: [tell $fileId]"

if {[catch {seek $fileId -5 start} err]} {
    puts "seek back from start: $err"
} else {
    puts "seek back from start: [tell $fileId]"
}

if {[catch {seek $fileId 5 end} err]} {
    puts "seek forward from end: $err"
} else {
    puts "seek forward from end: [tell $fileId]"
}

seek $fileId 0 end
puts "file size: [tell $fileId] bytes"

close $fileId

After opening the file, the first block of code moves the pointer 10 bytes into the file, reads the next 10 characters, and then displays the text it read between => and <= and the current position of the file pointer. The second code block positions the file pointer 25 bytes from the end of the file, reads 25 characters, and then displays the text it read and the current position of the file pointer.

The next two sections of code attempt to seek backward from the beginning of the file and forward from the end of the file. I use the catch command so an error during either operation won’t abort the script. Notice in the output that reading backward from the beginning of the file causes an error but that reading forward from the end of the file moves the file pointer five characters forward, to offset 107706.

Positioning the file pointer past the end of the file works for several reasons. First, seek simply reports the position of the file pointer, an operation independent of reading or writing. seek has no idea whether you are going to read or write the underlying file. Secondly, while no filesystem of which I’m aware supports the notion of adding data to the front of a file, most (if not all) permit data to be appended to the end of a file. Accordingly, you have to be able to position the pointer past the end of the file to do so.

Finally, most filesystems allow you to create sparse files, or files that have holes in them. Such a file will have a length of N bytes, yet will contain fewer than N bytes of data. Byte ranges of files that contain no data are known as holes, and files that contain such holes are referred to as sparse files.

The last section of code shows you a trick for finding out a file’s size in bytes: seek to the end of the file and then use tell to get the location of the file pointer. Unfortunately, you can’t use this trick to determine the length of sparse files.

When executed, the script’s output should look like the following:

$ ./randread.tcl
Text between bytes 10 and 20: => Project G<=
File pointer at byte: 20
Last 25 characters: =>ented as Public Domain.

<=
File pointer at byte: 107706
Seek back from start: error during seek on "file5": invalid argument
Seek forward from end: 107706
File size: 107701 bytes

You can also use seek and tell with output operations, as demonstrated in the following script (see randwrite.tcl in this chapter’s code directory):

set fileId [open output.txt r+]
seek $fileId 0 end;
set oldSize [tell $fileId]

seek $fileId 10 start
puts "Offset before puts: [tell $fileId]"
puts -nonewline $fileId [string repeat * 10]
puts "Offset after puts: [tell $fileId]"

seek $fileId [expr $oldSize - 25]
puts "Offset before puts: [tell $fileId]"
puts -nonewline $fileId [string repeat * 10]
puts "Offset after puts: [tell $fileId]"

seek $fileId [expr $oldSize + 800]
puts "Offset before puts: [tell $fileId]"
puts $fileId [string repeat * 10]
puts "Offset after puts: [tell $fileId]"

seek $fileId 0 end
puts "New file length: [tell $fileId] bytes"

close $fileId

Perhaps the first question you’ll ask when you look at this script is why I open the file I want to write in r+ mode (read/write), rather than for writing or appending. To insert new data or overwrite existing data, you must read the existing data before adding new data. If I open the file in w or w+ mode, I’ll truncate the existing file. Similarly, if I open the file in a or a+ mode, data written to the file will wind up appended to the end of the file, regardless of where I position the file pointer before starting the write. The behavior in the append modes is somewhat counterintuitive, but if you think about it, it is called append mode. If it really bothers you, you could write a procedure that adds insert and overwrite modes to the open command, but that would just result in all the other Tcl programmers teasing you.

After opening the file, I seek to the end and then store its original size (actually, its byte length) in the variable $oldSize. I’ll explain why in a moment. Next, I seek 10 bytes into the file and write 10 asterisks starting at that offset.

The next code block scribbles 10 more asterisks 25 bytes from the end of the file. In this case, though, I use the expression $oldSize–25 to calculate the offset. I do this because I want to insert data at the original EOF; after the first puts command, the EOF has moved from byte 661 to byte 671. Schlepping around the original EOF offset enables me to write in the correct location.

The last write adds another 10 asterisks 800 bytes past the original EOF. Again, I use $oldSize as the reference point for the offset. After all the writing is done, I calculate and display the length of the modified file and close the file.

To execute this script and verify for yourself that it behaves as I’ve described, use the following sequence of commands:

$ ccp sonnet20.txt output.txt
$ ls -l sonnet20.txt output.txt
-rw-r--r-- 1 kwall kwall 661 2007-08-08 03:18 output.txt
-rw-r--r-- 1 kwall kwall 661 2007-08-06 23:30 sonnet20.txt
$ ./randwrite.tcl
Offset before puts: 10
Offset after puts: 20
Offset before puts: 636
Offset after puts: 646
Offset before puts: 1461
Offset after puts: 1469
New file length: 1469 bytes
$ ls -l sonnet20.txt output.txt
-rw-r--r-- 1 kwall kwall 1472 2007-08-08 03:13 output.txt
-rw-r--r-- 1 kwall kwall 661 2007-08-06 23:30 sonnet20.txt
$ diff -a sonnet20.txt output.txt
3c3
< A woman's face with nature's own hand painted,
---
> A woma**********ith nature's own hand painted,
16c16
<   Mine be thy love and thy love's use their treasure.
---
>   Mine be thy love and thy lov**********eir treasure.
17a18
> **********

The cp command creates a copy of sonnet20.txt. The ls command verifies that the two files are identical. After executing randwrite.tcl, the second ls command shows that the two files have different sizes. The diff command, finally, shows the actual differences between the original file and its modified copy.

Caution: Bytes Versus Characters

Caution: Bytes Versus Characters

The seek and tell commands calculate file positions in terms of bytes, or, rather, byte offsets. However, the read command operates in terms of character offsets. In most situations, this distinction doesn’t matter because in the ASCII character set, each character is one byte long. Thus, reading five characters grabs five bytes of data. The distinction becomes important when you work with multibyte character sets (such as Asian language character sets), which use multiple bytes to encode a single character. For the purposes of this book, one byte equals one character; just be aware that this is not always the case.

Working with Directories

Like any proper programming language, Tcl enables you move around in the filesystem and create, delete, and rename directories. When a Tcl script begins executing, its working directory is the directory from which it was invoked. To change your working directory, use the cd command. If you want to find out the current working directory, use the pwd command. The syntax of these commands is:

cd ?dirName?
pwd

If you omit dirName, cd sets the script’s working directory to the directory specified by the $HOME environment variable. If $HOME is not set or the directory it references does not exist, cd raises an error and the script aborts. After successful execution, cd returns the empty string.

pwd returns the absolute pathname of the current directly. The short script that follows illustrates cd and pwd (see dirs.tcl in this chapter’s code directory):

puts "Current directory: [pwd]"
cd /tmp
puts "Current directory: [pwd]"
cd
puts "Current directory: [pwd]"

The output from this script is what you would expect:

$ pwd
/home/kwall/tclbook/08
$ ./dirs.tcl
Current directory: /home/kwall/tclbook/08
Current directory: /tmp
Current directory: /home/kwall

$ pwd
/home/kwall/tclbook/08

As you can see, after the script terminates, the working directory of my shell is unchanged. This is because the Tcl script executes in a subshell, so when the subshell terminates, any changes it made to its execution environment (such as the initial working directory) are destroyed.

Analyzing Word Search

As I noted at the beginning of the chapter, what’s new in word_search.tcl is the file handling and the way commands are combined to get the desired results.

Looking at the Code

#!/usr/bin/tclsh
# word_search.tcl
# Find words embedded in a string of letters stored in a text file

#
# Block 0
#
# Read the puzzle data from a file
proc ReadPuzzle {srcFile} {
    global starts stops lines

    # Open the puzzle file
    set fileId [open $srcFile r]

    # Read the source file
    while {[gets $fileId input] > -1} {
        lappend starts [lindex $input 0]
        lappend stops [lindex $input 1]
        lappend lines [lrange $input 2 end]
    }

    # Close the source file
    close $fileId
}

# Clear the screen and redraw the puzzle
proc DisplayPuzzle {} {
    global starts lines

    # Display the puzzle
    exec clear >@ stdout
    for {set i 0} {$i < [llength $starts]} {incr i} {
        puts [format "%-4d%s" [expr $i + 1] [lindex $lines $i]]
    }
}

# Get the line on which the player wants to work
proc GetPlayerLine {min max} {
    puts -nonewline "
Select a line (1-9): "
    flush stdout
    set playerLine [gets stdin]

    # Did player choose a valid line number?
    if {$playerLine < $min || $playerLine > $max} {
        puts "Select a line number between $min and $max"
        exit 1
    }
    return $playerLine
}

# Get the word the player found
proc GetPlayerWord {} {
    puts -nonewline "What word do you see: "
    flush stdout
    set playerWord [gets stdin]
    return $playerWord
}

# Compare the player's guess to the correct answer
proc GuessCorrect {playerLine playerWord} {
    global starts stops lines
    # Did user guess correctly?
    set start [expr [lindex $starts [expr $playerLine - 1]] - 1]
    set stop [expr [lindex $stops [expr $playerLine - 1]] - 1]
    set line [lindex $lines [expr $playerLine - 1]]
    set puzzleWord [join [lrange $line $start $stop] ""]
    if {[string match -nocase $puzzleWord $playerWord]} {
        return true
    } else {
        return false
    }
}
#
# Block 1
#
# Main game loop
ReadPuzzle puzzle.txt
set continue "y"
while {$continue ne "n"} {
    DisplayPuzzle
    set playerLine [GetPlayerLine 1 9]
    set playerWord [GetPlayerWord]
    if {[GuessCorrect $playerLine $playerWord] == true} {
        puts "Correct!"
        puts -nonewline "Play again (Y/n)? "
    } else {
        puts "Sorry."
        puts -nonewline "Try again (Y/n)? "
    }
    flush stdout
    set continue [string tolower [gets stdin]]
}

Understanding the Code

Most of the code is in Block 0, the procedure definitions. The first procedure, ReadPuzzle, opens the puzzle data file passed as an argument and splits the data into three lists. To make sense of the data parsing, have a look at a sample data line from puzzle.txt:

2 5 e o p e n u g r i v c

Each row of data translates to a single row in the game grid. The data points are space-delimited. The first two values contain the starting and ending locations of the word in that row, and the rest of the data (11 letters) constitutes the row to display on the game grid. For example, in the record above, the word of interest begins in the second column and ends in the fifth column of the row. The columns are numbered from one, so the word in this row is “open.” The while loop reads the data file line-by-line and uses lappend to create three ordered lists of starting and stopping locations and the text lines (starts, stop, and lines, respectively).

DisplayPuzzle uses a UNIX-specific command, clear, to clear the screen between each round. Because clear is an external command, not a Tcl built-in, I use the Tcl exec command to execute and redirect clear’s ouput to stdout. The balance of the procedure is a simple for loop that uses the format command to display a nicely formatted line, consisting of the row number and the letters. I use the length of one of the lists as the loop control value; each of the three lists has the same length, so I could have used any of them.

GetPlayerLine solicits the row number in which the player is interested. The min and max arguments set the minimum and maximum values for the row number. If the player inputs a number outside of that range, the script terminates after printing a short usage message. Otherwise, GetPlayerLine returns the line number the user entered. GetPlayerWord asks the player to type in the word and returns it to the calling procedure.

The GuessCorrect procedure is word_search.tcl’s workhorse. It accepts two arguments, the line number entered in GetPlayerLine and the word entered in GetPlayerWord, and then it compares the player’s guess to the target word embedded in the data line. It returns true if the player’s word matches the puzzle’s word and false otherwise. I use list manipulation to extract the target word from the puzzle data. Recall that lists are indexed from zero. The line number displayed to the player and the starting and ending points for each word in the data file, however, are indexed from one. Accordingly, to extract the correct data, I have to subtract 1 from both $playerLine and from the index value passed to the lindex commands. I use the join command to convert the list of discrete letters returned by the lrange to a proper string. This step is necessary because lrange returns a list of elements that are separated by spaces, and I need a string to perform the comparison in the string match command.

Block 1, as you can see, is short and to the point. It invokes ReadPuzzle, sets the game loop control variable to y, and then enters the game play loop. The while loop displays the game grid, calls GetPlayerLine and GetPlayerWord to set up the comparison, and then calls GuessCorrect to evaluate the guess. It displays the result and then asks the player to play again. The way the enclosing while loop is written, gameplay terminates if the player enters anything but Y or y.

Modifying the Code

Here are some exercises you can try to practice what you learned in this chapter:

  • 8.1 Modify the GetPlayerLine procedure to loop until the player enters a line number between min and max, inclusive, rather than terminating.

  • 8.2 Modify the while loop in Block 1 so that only N or n will cause the game to exit.

  • 8.3 Modify the code to support keeping score. The score should include how many words players guess correctly and incorrectly and how many total guesses the players made. Show a scoring percentage in addition to the raw scores for right and wrong guesses.

You won’t get very far in your Tcl programming before it will become desirable, if not downright necessary, to read and write files. Use open and close to create I/O channels, the essential first step for performing file I/O. The gets and read commands can be used to read files, while the puts command works for writing files. If you prefer attractive, easy-to-read output, you’ll spend quality time with the format command. Sequential file I/O is often the appropriate way to access files, but there are many situations in which you know exactly where in a file you need to be. In other cases, you might want to update a particular piece of data in a file. In such situations, random file access, brought to you by seek and tell, are the tickets to file I/O happiness.

This chapter concludes your whirlwind introduction to Tcl programming. With the material in these first eight chapters, and plenty of practice, you have everything you need to get started writing GUI programs using Tcl’s graphical counterpart, Tk.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.236.108