Chapter 9. Working with Files and Programs

This chapter describes how to run programs, examine the file system, and access environment variables through the env array. Tcl commands described are: exec, file, open, close, read, write, puts, gets, flush, seek, tell, glob, pwd, cd, exit, pid, and registry.

This chapter describes how to run programs and access the file system from Tcl. These commands were designed for UNIX. In Tcl 7.5 they were implemented in the Tcl ports to Windows and Macintosh. There are facilities for naming files and manipulating file names in a platform-independent way, so you can write scripts that are portable across systems. These capabilities enable your Tcl script to be a general-purpose glue that assembles other programs into a tool that is customized for your needs. Tcl 8.4 added support for 64-bit file systems, where available.

Running Programs with exec

The exec command runs programs from your Tcl script.[*] For example:

set d [exec date]

The standard output of the program is returned as the value of the exec command. However, if the program writes to its standard error channel or exits with a nonzero status code, then exec raises an error. If you do not care about the exit status, or you use a program that insists on writing to standard error, then you can use catch to mask the errors:

catch {exec program arg arg} result

The exec command supports a full set of I/O redirection and pipeline syntax. Each process normally has three I/O channels associated with it: standard input, standard output, and standard error. With I/O redirection, you can divert these I/O channels to files or to I/O channels you have opened with the Tcl open command. A pipeline is a chain of processes that have the standard output of one command hooked up to the standard input of the next command in the pipeline. Any number of programs can be linked together into a pipeline.

Example 9-1. Using exec on a process pipeline

set n [exec sort < /etc/passwd | uniq | wc -l 2> /dev/null]

Example 9-1 uses exec to run three programs in a pipeline. The first program is sort, which takes its input from the file /etc/passwd. The output of sort is piped into uniq, which suppresses duplicate lines. The output of uniq is piped into wc, which counts the lines. The error output of the command is diverted to the null device to suppress any error messages. Table 9-1 provides a summary of the syntax understood by the exec command.

Table 9-1. Summary of the exec syntax for I/O redirection

-keepnewline

(First argument.) Do not discard trailing newline from the result.

|

Pipes standard output from one process into another.

|&

Pipes both standard output and standard error output.

< fileName

Takes input from the named file.

<@ fileId

Takes input from the I/O channel identified by fileId.

<< value

Takes input from the given value.

> fileName

Overwrites fileName with standard output.

2> fileName

Overwrites fileName with standard error output.

>& fileName

Overwrites fileName with both standard error and standard out.

>> fileName

Appends standard output to the named file.

2>> fileName

Appends standard error to the named file.

>>& fileName

Appends both standard error and standard output to the named file.

>@ fileId

Directs standard output to the I/O channel identified by fileId.

2>@ fileId

Directs standard error to the I/O channel identified by fileId.

>&@ fileId

Directs both standard error and standard output to the I/O channel.

&

As the last argument, indicates pipeline should run in background.

A trailing & causes the program to run in the background. In this case, the process identifier is returned by the exec command. Otherwise, the exec command blocks during execution of the program, and the standard output of the program is the return value of exec. The trailing newline in the output is trimmed off, unless you specify -keepnewline as the first argument to exec.

If you look closely at the I/O redirection syntax, you'll see that it is built up from a few basic building blocks. The basic idea is that | stands for pipeline, > for output, and < for input. The standard error is joined to the standard output by &. Standard error is diverted separately by using 2>. You can use your own I/O channels by using @.

The auto_noexec Variable

The Tcl shell programs are set up during interactive use to attempt to execute unknown Tcl commands as programs. For example, you can get a directory listing by typing:

ls

instead of:

exec ls

This is handy if you are using the Tcl interpreter as a general shell. It can also cause unexpected behavior when you are just playing around. To turn this off, define the auto_noexec variable:

set auto_noexec anything

Limitations of exec on Windows

Windows 3.1 has an unfortunate combination of special cases that stem from console-mode programs, 16-bit programs, and 32-bit programs. In addition, pipes are really just simulated by writing output from one process to a temporary file and then having the next process read from that file. If exec or a process pipeline fails, it is because of a fundamental limitation of Windows. The good news is that Windows 98 and Windows NT cleaned up most of the problems with exec. Windows NT, Window 2000, and Windows XP are pretty robust.

Tcl 8.0p2 was the last release to officially support Windows 3.1. That release includes Tcl1680.dll, which is necessary to work with the win32s subsystem. If you copy that file into the same directory as the other Tcl DLLs, you may be able to use some later releases of Tcl on Windows 3.1. However, Tcl 8.3 completely removed support for win32s while adding support for Windows XP-64.

AppleScript on Macintosh

The exec command is not provided on the Macintosh. Tcl ships with an AppleScript extension that lets you control other Macintosh applications. You can find documentation in the AppleScript.html that goes with the distribution. You must use package require to load the AppleScript command:

package require Tclapplescript
AppleScript junk
=> bad option "junk": must be compile, decompile, delete, execute, info, load, run, or store.

The file Command

The file command provides several ways to check the status of files in the file system. For example, you can find out if a file exists, what type of file it is, and other file attributes. There are facilities for manipulating files in a platform-independent manner. Table 9-2 provides a summary of the various forms of the file command. They are described in more detail later. Note that several operations have been added since the introduction of the file command; the table indicates the version of Tcl in which they were added.

Table 9-2. The file command options

file atime name ?time?

Returns access time as a decimal string. If time is specified, the access time of the file is set.

file attributes name ?option? ?value? ...

Queries or sets file attributes. (Tcl 8.0)

file channels ?pattern?

Returns the open channels in this interpreter, optionally filtered by the glob-style pattern. (Tcl 8.3)

file copy ?-force? source destination

Copies file source to file destination. The source and destination can be directories. (Tcl 7.6)

file delete ?-force? name

Deletes the named file. (Tcl 7.6)

file dirname name

Returns parent directory of file name.

file executable name

Returns 1 if name has execute permission, else 0.

file exists name

Returns 1 if name exists, else 0.

file extension name

Returns the part of name from the last dot (i.e., .) to the end. The dot is included in the return value.

file isdirectory name

Returns 1 if name is a directory, else 0.

file isfile name

Returns 1 if name is not a directory, symbolic link, or device, else 0.

file join path path...

Joins pathname components into a new pathname. (Tcl 7.5)

file link ?-type? name ?target?

Returns the link pointed to by name, or creates a link to target if it is specified. type can be -hard or -symbolic. (Tcl 8.4)

file lstat name var

Places attributes of the link name into var.

file mkdir name

Creates directory name. (Tcl 7.6)

file mtime name ?time?

Returns modify time of name as a decimal string. If time is specified, the modify time of the file is set.

file nativename name

Returns the platform-native version of name. (Tk 8.0).

file normalize name

Returns a unique, absolute, path for name while eliminating extra /, /., and /.. components. (Tcl 8.4)

file owned name

Returns 1 if current user owns the file name, else 0.

file pathtype name

relative, absolute, or volumerelative. (Tcl 7.5)

file readable name

Returns 1 if name has read permission, else 0.

file readlink name

Returns the contents of the symbolic link name.

file rename ?-force? old new

Changes the name of old to new. (Tcl 7.6)

file rootname name

Returns all but the extension of name (i.e., up to but not including the last . in name).

file separator ?name?

Returns the default file separator character on this file system, or the separator character for name if it is specified. (Tcl 8.4)

file size name

Returns the number of bytes in name.

file split name

Splits name into its pathname components. (Tcl 7.5)

file stat name var

Places attributes of name into array var. The elements defined for var are listed in Table 9-3.

file system name

Returns a tuple of the filesystem for name (e.g. native or vfs) and the platform-specific type for name (e.g NTFS or FAT32). (Tcl 8.4)

file tail name

Returns the last pathname component of name.

file type name

Returns type identifier, which is one offile, directory, characterSpecial, blockSpecial, fifo, link, or socket.

file volumes name

Returns the available file volumes on this computer. On Unix, this always returns /. On Windows, this would be a list like {a:/ c:/}. (Tcl 8.3)

file writable name

Returns 1 if name has write permission, else 0.

Cross-Platform File Naming

Files are named differently on UNIX, Windows, and Macintosh. UNIX separates file name components with a forward slash (/), Macintosh separates components with a colon (:), and Windows separates components with a backslash (). In addition, the way that absolute and relative names are distinguished is different. For example, these are absolute pathnames for the Tcl script library (i.e., $tcl_library) on Macintosh, Windows, and UNIX, respectively:

Disk:System Folder:Extensions:Tool Command Language:tcl7.6
c:Program FilesTcllibTcl7.6
/usr/local/tcl/lib/tcl7.6

The good news is that Tcl provides operations that let you deal with file pathnames in a platform-independent manner. The file operations described in this chapter allow either native format or the UNIX naming convention. The backslash used in Windows pathnames is especially awkward because the backslash is special to Tcl. Happily, you can use forward slashes instead:

c:/Program Files/Tcl/lib/Tcl7.6

There are some ambiguous cases that can be specified only with native pathnames. On my Macintosh, Tcl and Tk are installed in a directory that has a slash in it. You can name it only with the native Macintosh name:

Disk:Applications:Tcl/Tk 4.2

Another construct to watch out for is a leading // in a file name. This is the Windows syntax for network names that reference files on other computers. You can avoid accidentally constructing a network name by using the file join command described next. Of course, you can use network names to access remote files.

If you must communicate with external programs, you may need to construct a file name in the native syntax for the current platform. You can construct these names with file join described later. You can also convert a UNIX-like name to a native name with file nativename.

Several of the file operations operate on pathnames as opposed to returning information about the file itself. You can use the dirname, extension, join, normalize, pathtype, rootname, split, and tail operations on any string; there is no requirement that the pathnames refer to an existing file.

Building up Pathnames: file join

You can get into trouble if you try to construct file names by simply joining components with a slash. If part of the name is in native format, joining things with slashes will result in incorrect pathnames on Macintosh and Windows. The same problem arises when you accept user input. The user is likely to provide file names in native format. For example, this construct will not create a valid pathname on the Macintosh because $tcl_library is in native format:

set file $tcl_library/init.tcl

Note

Use file join to construct file names.

The platform-independent way to construct file names is with file join. The following command returns the name of the init.tcl file in native format:

set file [file join $tcl_library init.tcl]
Building up Pathnames: file joincreatefile pathnames
file join a b/c d
=> a/b/c/d
file join a /b/c d
=> /b/c/d

On Macintosh, a relative pathname starts with a colon, and an absolute pathname does not. To specify an absolute path, you put a trailing colon on the first component so that it is interpreted as a volume specifier. These relative components are joined into a relative pathname:

file join a :b:c d
=> :a:b:c:d

In the next case, b:c is an absolute pathname with b: as the volume specifier. The absolute name overrides the previous relative name:

file join a b:c d
=> b:c:d

The file join operation converts UNIX-style pathnames to native format. For example, on Macintosh you get this:

file join /usr/local/lib
=> usr:local:lib

Chopping Pathnames: split, dirname, tail

The file split command divides a pathname into components. It is the inverse of file join. The split operation detects automatically if the input is in native or UNIX format. The results of file split may contain some syntax to help resolve ambiguous cases when the results are passed back to file join. For example, on Macintosh a UNIX-style pathname is split on slash separators. The Macintosh syntax for a volume specifier (Disk:) is returned on the leading component:

file split "/Disk/System Folder/Extensions"
=> Disk: {System Folder} Extensions

A common reason to split up pathnames is to divide a pathname into the directory part and the file part. This task is handled directly by the dirname and tail operations. The dirname operation returns the parent directory of a pathname, while tail returns the trailing component of the pathname:

file dirname /a/b/c
=> /a/b
file tail /a/b/c
=> c

For a pathname with a single component, the dirname option returns ".", on UNIX and Windows, or ":" on Macintosh. This is the name of the current directory.

The extension and root options are also complementary. The extension option returns everything from the last period in the name to the end (i.e., the file suffix including the period.) The root option returns everything up to, but not including, the last period in the pathname:

file root /a/b.c
=> /a/b
file extension /a/b.c
=> .c

Manipulating Files and Directories

Tcl 7.6 added file operations to copy files, delete files, rename files, and create directories. In earlier versions it was necessary to exec other programs to do these things, except on Macintosh, where cp, rm, mv, mkdir, and rmdir were built in. These commands are no longer supported on the Macintosh. Your scripts should use the file command operations described below to manipulate files in a platform-independent way.

File name patterns are not directly supported by the file operations. Instead, you can use the glob command described on page 122 to get a list of file names that match a pattern.

Copying Files

The file copy operation copies files and directories. The following example copies file1 to file2. If file2 already exists, the operation raises an error unless the -force option is specified:

file copy ?-force? file1 file2

Several files can be copied into a destination directory. The names of the source files are preserved. The -force option indicates that files under directory can be replaced:

file copy ?-force? file1 file2 ... directory

Directories can be recursively copied. The -force option indicates that files under dir2 can be replaced:

file copy ?-force? dir1 dir2

Creating Directories

The file mkdir operation creates one or more directories:

file mkdir dir dir ...

It is not an error if the directory already exists. Furthermore, intermediate directories are created if needed. This means that you can always make sure a directory exists with a single mkdir operation. Suppose /tmp has no subdirectories at all. The following command creates /tmp/sub1 and /tmp/sub1/sub2:

file mkdir /tmp/sub1/sub2

The -force option is not understood by file mkdir, so the following command accidentally creates a folder named -force, as well as one named oops.

file mkdir -force oops

Symbolic and Hard Links

The file link operation allows the user to manipulate links. Hard links are directory entries that directly reference an existing file or directory. Symbolic (i.e., soft) links are files that contain the name of another file or directory. Generally, opening a link opens the file referenced by the link. Operating system support for links varies. Unix supports both types of links. Classic Macintosh only supports symbolic links (i.e., aliases). Windows 95/98/ME do not support links at all, while Windows NT/2000/XP support symbolic links to directories and hard links to files.

With only a single argument, file link returns the value of a symbolic link, or raises an error if the file is not a symbolic link. With two pathname arguments, the first is the name of the link, and the second is the name of the file referenced by the link. If you leave out the -hard or -symbolic, the appropriate link type is created for the current platform:

file link the_link the_existing_file

Deleting Files

The file delete operation deletes files and directories. It is not an error if the files do not exist. A non-empty directory is not deleted unless the -force option is specified, in which case it is recursively deleted:

file delete ?-force? name name ...

To delete a file or directory named -force, you must specify a nonexistent file before the -force to prevent it from being interpreted as a flag (-force -force won't work):

file delete xyzzy -force

Renaming Files and Directories

The file rename operation changes a file's name from old to new. The -force option causes new to be replaced if it already exists.

file rename ?-force? old new

Using file rename is the best way to update an existing file. First, generate the new version of the file in a temporary file. Then, use file rename to replace the old version with the new version. This ensures that any other programs that access the file will not see the new version until it is complete.

File Attributes

There are several file operations that return specific file attributes: atime, executable, exists, isdirectory, isfile, mtime, owned, readable, readlink, size and type. Refer to Table 9-2 on page 108 for their function. The following command uses file mtime to compare the modify times of two files. If you have ever resorted to piping the results of ls -l into awk in order to derive this information in other shell scripts, you will appreciate this example:

Example 9-2. Comparing file modify times

proc newer { file1 file2 } {
   if {![file exists $file2]} {
      return 1
   } else {
      # Assume file1 exists
      expr {[file mtime $file1] > [file mtime $file2]}
   }
}

You can use the optional time argument to mtime and atime to set the file's time attributes, like the Unix touch command. The stat and lstat operations return a collection of file attributes. They take a third argument that is the name of an array variable, and they initialize that array with elements that contain the file attributes. If the file is a symbolic link, then the lstat operation returns information about the link itself and the stat operation returns information about the target of the link.

Table 9-3. Array elements defined by file stat

atime

The last access time, in seconds.

ctime

The last change time (not the create time), in seconds.

dev

The device identifier, an integer.

gid

The group owner, an integer.

ino

The file number (i.e., inode number), an integer.

mode

The permission bits.

mtime

The last modify time, in seconds.

nlink

The number of links, or directory references, to the file.

size

The number of bytes in the file.

type

file, directory, characterSpecial, blockSpecial, fifo, link, or socket.

uid

The owner's user ID, an integer.

The array elements are listed in Table 9-3. All the element values are decimal strings, except for type, which can have the values returned by the type option. The element names are based on the UNIX stat system call. Use the file attributes command described later to get other platform-specific attributes.

Example 9-3 uses the device (dev) and inode (ino) attributes of a file to determine whether two pathnames reference the same file. These attributes are UNIX specific; they are not well defined on Windows and Macintosh.

Example 9-3. Determining whether pathnames reference the same file

proc fileeq { path1 path2 } {
   file stat $path1 stat1
   file stat $path2 stat2
   expr {$stat1(ino) == $stat2(ino) && 
          $stat1(dev) == $stat2(dev)}
}

The file attributes operation was added in Tcl 8.0 to provide access to platform-specific attributes. The attributes operation lets you set and query attributes. The interface uses option-value pairs. With no options, all the current values are returned.

file attributes book.doc
=> -creator FRAM -hidden 0 -readonly 0 -type MAKR

These Macintosh attributes are explained in Table 9-4. The four-character type codes used on Macintosh are illustrated on page 600. With a single option, only that value is returned:

file attributes book.doc -readonly
=> 0

The attributes are modified by specifying one or more option–value pairs. Setting attributes can raise an error if you do not have the right permissions:

file attributes book.doc -readonly 1 -hidden 0

Table 9-4. Platform-specific file attributes

-permissions mode

File permission bits. mode is an octal number or symbolic representation (e.g. a+x) with bits defined by the chmod system call, or a simplified ls-style string of the form rwxrwxrwx (must be 9 characters). (UNIX)

-group ID

The group owner of the file. (UNIX)

-owner ID

The owner of the file. (UNIX)

-archive bool

The archive bit, which is set by backup programs. (Windows)

-system bool

If set, then you cannot remove the file. (Windows)

-longname

The long (expanded) version of the pathname. Read-only. (Windows)

-shortname

The short (8.3) version of the pathname. Read-only. (Windows)

-hidden bool

If set, then the file does not appear in listings. (Windows, Macintosh)

-readonly bool

If set, then you cannot write the file. (Windows, Macintosh)

-creator type

type is 4-character code of creating application. (Macintosh)

-type type

type is 4-character type code. (Macintosh)

Input/Output Command Summary

The following sections describe how to open, read, and write files. The basic model is that you open a file, read or write it, then close the file. Network sockets also use the commands described here. Socket programming is discussed in Chapter 17, and more advanced event-driven I/O is described in Chapter 16. Table 9-5 lists the basic commands associated with file I/O:

Table 9-5. Tcl commands used for file access

open what ?access? ?permissions?

Returns channel ID for a file or pipeline.

puts ?-nonewline? ?channel? string

Writes a string.

gets channel ?varname?

Reads a line.

read channel ?numBytes?

Reads numBytes bytes, or all data.

read -nonewline channel

Reads all bytes and discard the last .

tell channel

Returns the seek offset.

seek channel offset ?origin?

Sets the seek offset. origin is one of start, current, or end.

eof channel

Queries end-of-file status.

flush channel

Writes buffers of a channel.

close channel

Closes an I/O channel.

Opening Files for I/O

The open command sets up an I/O channel to either a file or a pipeline of processes. The return value of open is an identifier for the I/O channel. Store the result of open in a variable and use the variable as you used the stdout, stdin, and stderr identifiers in the examples so far. The basic syntax is:

open what ?access? ?permissions?

The what argument is either a file name or a pipeline specification similar to that used by the exec command. The access argument can take two forms, either a short character sequence that is compatible with the fopen library routine, or a list of POSIX access flags. Table 9-6 summarizes the first form, while Table 9-7 summarizes the POSIX flags. If access is not specified, it defaults to read.

Example 9-4. Opening a file for writing

set fileId [open /tmp/foo w 0600]
puts $fileId "Hello, foo!"
close $fileId

The permissions argument is a value used for the permission bits on a newly created file. UNIX uses three bits each for the owner, group, and everyone else. The bits specify read, write, and execute permission. These bits are usually specified with an octal number, which has a leading zero, so that there is one octal digit for each set of bits. The default permission bits are 0666, which grant read/write access to everybody. Example 9-4 specifies 0600 so that the file is readable and writable only by the owner. 0775 would grant read, write, and execute permissions to the owner and group, and read and execute permissions to everyone else. You can set other special properties with additional high-order bits. Consult the UNIX manual page on chmod command for more details.

Table 9-6. Summary of the open access arguments

r

Opens for reading. The file must exist.

r+

Opens for reading and writing. The file must exist.

w

Opens for writing. Truncate if it exists. Create if it does not exist.

w+

Opens for reading and writing. Truncate or create.

a

Opens for writing. Data is appended to the file.

a+

Opens for reading and writing. Data is appended.

Table 9-7. Summary of POSIX flags for the access argument

RDONLY

Opens for reading.

WRONLY

Opens for writing.

RDWR

Opens for reading and writing.

APPEND

Opens for append.

CREAT

Creates the file if it does not exist.

EXCL

If CREAT is also specified, then the file cannot already exist.

NOCTTY

Prevents terminal devices from becoming the controlling terminal.

NONBLOCK

Does not block during the open.

TRUNC

Truncates the file if it exists.

The following example illustrates how to use a list of POSIX access flags to open a file for reading and writing, creating it if needed, and not truncating it. This is something you cannot do with the simpler form of the access argument:

set fileId [open /tmp/bar {RDWR CREAT}]

Note

Summary of POSIX flags for the access argument

Catch errors from open.

In general, you should check for errors when opening files. The following example illustrates a catch phrase used to open files. Recall that catch returns 1 if it catches an error; otherwise, it returns zero. It treats its second argument as the name of a variable. In the error case, it puts the error message into the variable. In the normal case, it puts the result of the command into the variable:

Example 9-5. A more careful use of open

if [catch {open /tmp/data r} fileId] {
   puts stderr "Cannot open /tmp/data: $fileId"
} else {
   # Read and process the file, then...
   close $fileId
}

Opening a Process Pipeline

You can open a process pipeline by specifying the pipe character, |, as the first character of the first argument. The remainder of the pipeline specification is interpreted just as with the exec command, including input and output redirection. The second argument determines which end of the pipeline open returns. The following example runs the UNIX sort program on the password file, and it uses the split command to separate the output lines into list elements:

Example 9-6. Opening a process pipeline

set input [open "|sort /etc/passwd" r]
set contents [split [read $input] 
]
close $input

You can open a pipeline for both read and write by specifying the r+ access mode. In this case, you need to worry about buffering. After a puts, the data may still be in a buffer in the Tcl library. Use the flush command to force the data out to the spawned processes before you try to read any output from the pipeline. You can also use the fconfigure command described on page 233 to force line buffering. Remember that read-write pipes will not work at all with Windows 3.1 because pipes are simulated with files. Event-driven I/O is also very useful with pipes. It means you can do other processing while the pipeline executes, and simply respond when the pipe generates data. This is described in Chapter 16.

Expect

If you are trying to do sophisticated things with an external application, you will find that the Expect extension provides a much more powerful interface than a process pipeline. Expect adds Tcl commands that are used to control interactive applications. It is extremely useful for automating a variety of applications such as ssh, Telnet, and programs under test. Tcl is able to handle simple FTP sessions, telnet and many command line controllable applications, but Expect has extra control at the tty level that is essential for certain applications. It comes on some systems as a specially built Tcl shell named expect, and it is also available as an extension that you can dynamically load into Tcl shells with:

package require Expect

Expect was created by Don Libes at the National Institute of Standards and Technology (NIST). Expect is described in Exploring Expect (Libes, O'Reilly & Associates, Inc., 1995). You can find the software on the CD and on the web at:

Reading and Writing

The standard I/O channels are already open for you. There is a standard input channel, a standard output channel, and a standard error output channel. These channels are identified by stdin, stdout, and stderr, respectively. Other I/O channels are returned by the open command, and by the socket command described on page 239.

There may be cases when the standard I/O channels are not available. The wish shells on Windows and Macintosh have no standard I/O channels. Some UNIX window managers close the standard I/O channels when you start programs from window manager menus. You can also close the standard I/O channels with close.

The puts and gets Commands

The puts command writes a string and a newline to the output channel. There are a couple of details about the puts command that we have not yet used. It takes a -nonewline argument that prevents the newline character that is normally appended to the output channel. This is used in the prompt example below. The second feature is that the channel identifier is optional, defaulting to stdout if not specified. Note that you must use flush to force output of a partial line. This is illustrated in Example 9-7.

Example 9-7. Prompting for input

puts -nonewline "Enter value: "
flush stdout ;# Necessary to get partial line output
set answer [gets stdin]

The gets command reads a line of input, and it has two forms. In the previous example, with just a single argument, gets returns the line read from the specified I/O channel. It discards the trailing newline from the return value. If end of file is reached, an empty string is returned. You must use the eof command to tell the difference between a blank line and end-of-file. eof returns 1 if there is end of file. Given a second varName argument, gets stores the line into a named variable and returns the number of bytes read. It discards the trailing newline, which is not counted. A -1 is returned if the channel has reached the end of file.

Example 9-8. A read loop using gets

while {[gets $channel line] >= 0} {
   # Process line
}
close $channel

The read Command

The read command reads blocks of data, and this capability is often more efficient. There are two forms for read: You can specify the -nonewline argument or the numBytes argument, but not both. Without numBytes, the whole file (or what is left in the I/O channel) is read and returned. The -nonewline argument causes the trailing newline to be discarded. Given a byte count argument, read returns that amount, or less if there is not enough data in the channel. The trailing newline is not discarded in this case.

Example 9-9. A read loop using read and split

foreach line [split [read $channel] 
] {
   # Process line
}
close $channel

For moderate-sized files, it is about 10 percent faster to loop over the lines in a file using the read loop in the second example. In this case, read returns the whole file, and split chops the file into list elements, one for each line. For small files (less than 1K) it doesn't really matter. For large files (megabytes) you might induce paging with this approach.

Platform-Specific End of Line Characters

Tcl automatically detects different end of line conventions. On UNIX, text lines are ended with a newline character ( ). On Macintosh, they are terminated with a carriage return ( ). On Windows, they are terminated with a carriage return, newline sequence ( ). Tcl accepts any of these, and the line terminator can even change within a file. All these different conventions are converted to the UNIX style so that once read, text lines are always terminated with a newline character ( ). Both the read and gets commands do this conversion.

During output, text lines are generated in the platform-native format. The automatic handling of line formats means that it is easy to convert a file to native format. You just need to read it in and write it out:

puts -nonewline $out [read $in]

To suppress conversions, use the fconfigure command, which is described in more detail on page 234.

Example 9-10 demonstrates a File_Copy procedure that translates files to native format. It is complicated because it handles directories.

Example 9-10. Copy a file and translate to native format

proc File_Copy {src dest} {
   if {[file isdirectory $src]} {
      file mkdir $dest
      foreach f [glob -nocomplain [file join $src *]] {
         File_Copy $f [file join $dest [file tail $f]]
      }
      return
   }
   if {[file isdirectory $dest]} {
      set dest [file join $dest [file tail $src]]
    }
   set in [open $src]
   set out [open $dest w]
   puts -nonewline $out [read $in]
   close $out ; close $in
}

Random Access I/O

The seek and tell commands provide random access to I/O channels. Each channel has a current position called the seek offset. Each read or write operation updates the seek offset by the number of bytes transferred. The current value of the offset is returned by the tell command. The seek command sets the seek offset by an amount, which can be positive or negative, from an origin which is either start, current, or end. If you are dealing with files greater than 2GB in size, you will need Tcl 8.4 for its 64-bit file system support.

Closing I/O Channels

The close command is just as important as the others because it frees operating system resources associated with the I/O channel. If you forget to close a channel, it will be closed when your process exits. However, if you have a long-running program, like a Tk script, you might exhaust some operating system resources if you forget to close your I/O channels.

Note

Closing I/O ChannelsClosing I/O channels

The close command can raise an error.

If the channel was a process pipeline and any of the processes wrote to their standard error channel, then Tcl believes this is an error. The error is raised when the channel to the pipeline is finally closed. Similarly, if any of the processes in the pipeline exit with a nonzero status, close raises an error.

The Current Directory — cd and pwd

Every process has a current directory that is used as the starting point when resolving a relative pathname. The pwd command returns the current directory, and the cd command changes the current directory. Example 9-11 uses these commands.

Matching File Names with glob

The glob command expands a pattern into the set of matching file names. The general form of the glob command is:

glob ?options? pattern ?pattern? ...

The pattern syntax is similar to the string match patterns:

  • * matches zero or more characters.

  • ? matches a single character.

  • [abc] matches a set of characters.

  • {a,b,c} matches any of a, b, or c.

  • All other characters must match themselves.

Table 9-8 lists the options for the glob command.

Table 9-8. glob command options

-directory dir

Search for files in the directory dir. (Tcl 8.3)

-join

The remaining pattern arguments are treated as a single pattern obtained by joining them with directory separators. (Tcl 8.3)

-nocomplain

Causes glob to return an empty list if no files match. Otherwise an error is raised.

-path path

Search for files in the given path prefix path. Allows you to search in areas that may contain glob-sensitive characters. (Tcl 8.3)

-tails

Only return the part of each file found that follows the last directory named in the -directory or -path argument. (Tcl 8.4)

-types types

Only return files matching the types specified.

--

Signifies the end of flags. Must be used if pattern begins with a -.

Unlike the glob matching in csh, the Tcl glob command matches only the names of existing files. In csh, the {a,b} construct can match nonexistent names. In addition, the results of glob are not sorted. Use the lsort command to sort its result if you find it important.

Example 9-11 shows the FindFile procedure, which traverses the file system hierarchy using recursion. At each iteration it saves its current directory and then attempts to change to the next subdirectory. A catch guards against bogus names. The glob command matches file names:

Example 9-11. Finding a file by name

proc FindFile { startDir namePat } {
   set pwd [pwd]
   if {[catch {cd $startDir} err]} {
      puts stderr $err
      return
   }
   foreach match [glob -nocomplain -- $namePat] {
      puts stdout [file join $startDir $match]
   }
   foreach file {[glob -nocomplain *]} {
      if [file isdirectory $file] {
         FindFile [file join $startDir $file] $namePat
      }
   }
   cd $pwd
}

The -types option allows for special filtered matching similar to the UNIX find command. The first form is like the -type option of find: b (block special file), c (character special file), d (directory), f (plain file), l (symbolic link), p (named pipe), or s (socket), where multiple types may be specified in the list. Glob will return all files which match at least one of the types given.

The second form specifies types where all the types given must match. These are r (readable), w (writable) and x (executable) as file permissions, and readonly and hidden as special cases. On the Macintosh, MacOS types and creators are also supported, where any item which is four characters long is assumed to be a MacOS type (e.g. TEXT). Items which are of the form {macintosh type XXXX} or {macintosh creator XXXX} will match types or creators respectively. Unrecognized types, or specifications of multiple MacOS types/creators will signal an error.

The two forms may be mixed, so -types {d f r w} will find all regular files OR directories that have both read AND write permissions.

Expanding Tilde in File Names

The glob command also expands a leading tilde (~) in filenames. There are two cases:

  • ~/ expands to the current user's home directory.

  • ~user expands to the home directory of user.

If you have a file that starts with a literal tilde, you can avoid the tilde expansion by adding a leading ./ (e.g., ./~foobar).

The exit and pid Commands

The exit command terminates your script. Note that exit causes termination of the whole process that was running the script. If you supply an integer-valued argument to exit, then that becomes the exit status of the process.

The pid command returns the process ID of the current process. This can be useful as the seed for a random number generator because it changes each time you run your script. It is also common to embed the process ID in the name of temporary files.

You can also find out the process IDs associated with a process pipeline with pid:

set pipe [open "|command"]
set pids [pid $pipe]

There is no built-in mechanism to control processes in the Tcl core. On UNIX systems you can exec the kill program to terminate a process:

exec kill $pid

Environment Variables

Environment variables are a collection of string-valued variables associated with each process. The process's environment variables are available through the global array env. The name of the environment variable is the index, (e.g., env(PATH)), and the array element contains the current value of the environment variable. If assignments are made to env, they result in changes to the corresponding environment variable. Environment variables are inherited by child processes, so programs run with the exec command inherit the environment of the Tcl script. The following example prints the values of environment variables.

Example 9-12. Printing environment variable values

proc printenv { args } {
   global env
   set maxl 0
   if {[llength $args] == 0} {
      set args [lsort [array names env]]
   }
   foreach x $args {
      if {[string length $x] > $maxl} {
         set maxl [string length $x]
      }
   }
   incr maxl 2
   foreach x $args {
      puts stdout [format "%*s = %s" $maxl $x $env($x)]
   }
}
printenv USER SHELL TERM
=>
USER  = welch
SHELL  = /bin/csh
TERM  = tx

NoteEnvironment variables can be initialized for Macintosh applications by editing a resource of type STR# whose name is Tcl Environment Variables. This resource is part of the tclsh and wish applications. Follow the directions on page 28 for using ResEdit. The format of the resource values is NAME=VALUE.

The registry Command

Windows uses the registry to store various system configuration information. The Windows tool to browse and edit the registry is called regedit. Tcl provides a registry command. It is a loadable package that you must load by using:

package require registry

The registry structure has keys, value names, and typed data. The value names are stored under a key, and each value name has data associated with it. The keys are organized into a hierarchical naming system, so another way to think of the value names is as an extra level in the hierarchy. The main point is that you need to specify both a key name and a value name in order to get something out of the registry. The key names have one of the following formats:

\hostnamerootnamekeypath
rootnamekeypath
rootname

The rootname is one of HKEY_LOCAL_MACHINE, HKEY_PERFORMANCE_DATA, HKEY_USERS, HKEY_CLASSES_ROOT, HKEY_CURRENT_USER, HKEY_CURRENT_CONFIG, or HKEY_DYN_DATA. Tables 9-9 and 9-10 summarize the registry command and data types:

Table 9-9. The registry command

registry delete key ?valueName?

Deletes the key and the named value, or it deletes all values under the key if valueName is not specified.

registry get key valueName

Returns the value associated with valueName under key.

registry keys key ?pat?

Returns the list of keys or value names under key that match pat, which is a string match pattern.

registry set key

Creates key.

registry set key valueName data ?type?

Creates valueName under key with value data of the given type. Types are listed in Table 9-10.

registry type key valueName

Returns the type of valueName under key.

registry values key ?pat?

Returns the names of the values stored under key that match pat, which is a string match pattern.

Table 9-10. The registry data types

binary

Arbitrary binary data.

none

Arbitrary binary data.

expand_sz

A string that contains references to environment variables with the %VARNAME% syntax.

dword

A 32-bit integer.

dword_big_endian

A 32-bit integer in the other byte order. It is represented in Tcl as a decimal string.

link

A symbolic link.

multi_sz

An array of strings, which are represented as a Tcl list.

resource_list

A device driver resource list.



[*] Unlike other UNIX shell exec commands, the Tcl exec does not replace the current process with the new one. Instead, the Tcl library forks first and executes the program as a child process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.240.222