Tcl allows us to read from and write to files by using channels. A channel can be an open file, but it can also be a network socket, a pipe, or any other channel type. Depending on the type of the channel, it can support reading from it, writing to it, or both.
Tcl comes with three default channels—stdin, stdout
, and stderr
. These channels correspond to the standard input, standard output, and standard error channels of operating systems. Standard input can be used to read information from the user, standard output should be used to write information to user, and standard error is used to write errors. Depending on how our application is run, these channels can be redirected from/to files.
In the case of Microsoft Windows and applications using the GUI version of Tcl, invoked using the wish or tclkit commands, standard channels are not available. This is because graphical applications in Microsoft Windows do not have standard consoles. In such cases, an equivalent of these channels is created that allows interacting with the user from Tk console window. For example:
The puts
command in this form is described in more detail later in this section.
The open
command can be used in order to open a file for reading and/or writing. It can be invoked either with just the filename, the filename and the open mode as well as filename, open mode and permissions, when creating a new file. Permissions are ignored on Microsoft Windows systems and default to 0666
if they are not specified on all other systems. Permissions are combined with the mask set up for current process, usually set by the umask
system command. This is similar to how all file creation operations work on Unix systems. The open
command returns the name of the newly opened channel—it can be used for all commands that operate on channels.
The open mode is a string specifying the access mode, defaulting to r
if it is not specified. Mode r
opens the file for reading only. Mode r+
opens file for reading and writing; the file must already exist. The w
mode opens the file for writing only, truncating it if the file already exists, and creating it if it does not exist. The w+
mode opens the file in same mode as w
mode, but allows both reading and writing. The a
mode opens the file for writing only, setting the current pointer in file to end if the file already exists, and creating the file if it does not exist. The a+
mode opens the file in same mode as a
mode, but allows both reading and writing.
The following table summarizes each of the modes, which operations are permitted, and any additional activity that takes place:
Mode |
Readable |
Writable |
Initial position |
Must exist |
Is truncated |
---|---|---|---|---|---|
|
Yes |
No |
Beginning of file |
Yes |
No |
|
Yes |
Yes |
Beginning of file |
Yes |
No |
|
No |
Yes |
Beginning of file |
No |
Yes |
|
Yes |
Yes |
Beginning of file |
No |
Yes |
|
No |
Yes |
End of file |
No |
No |
|
Yes |
Yes |
End of file |
No |
No |
The columns "Readable" and "Writable" describe which operations can be performed on a file. "Initial position" specifies whether the initial position after opening the file is its beginning or end. "Must exist" specifies whether the file has to exist at the point when open
command is called. "Is truncated" specifies whether a file is created as an empty file if it already exists when open
command is called.
The open
command can also be used to run external processes. In this case, the first argument should be command name prefixed by a pipe character (|). An additional argument might specify the open
mode, otherwise Tcl will open the application in read-only mode. For example:
set chan [open "|route"] while {![eof $chan]} { gets $chan line puts "Route information: > $line <" } close $chan
The commands gets
and eof
used to read information are described in more detail later in this chapter.
They can be used to parse information from external commands as well as to interact with a text-based application or command. In order to write to an application, we need to pass r+
as the mode to the open
command. Then we can use gets
to read a channel and use puts
to write to it.
In many cases, it is enough to use exec
command, which allows running processes and reading their output. It accepts system
command and its parameters as arguments to the command. By default, exec
returns standard output as result of the command and if anything is written to standard error, exec
throws an error with this data. For example, we can rework the preceding example to a simpler one by using exec:
foreach line [split [exec "route"] ] { puts "Route information: > $line <" }
The split
command converts output to a list of lines that the route
system command writes to standard output.
The exec
as well as open
commands are used for running a system command; streaming one command to another command handles majority of shell-like syntax for redirecting and conditional execution. For example, we can redirect input, output, or error streams and can redirect them in a way similar to shell commands. The following example runs ifconfig
and grep
to return only lines with IP addresses, which can then be parsed.
set text [exec /sbin/ifconfig 2>/dev/null | grep "inet addr"]
The exec
command also throws an error if a child exits with non-zero code. For example, we can try to run the command route with invalid parameters:
exec false
On Unix systems, the false
command always exits with code 1. This command would cause an error child process that is exited abnormally to be thrown.
Please see http://www.tcl.tk/man/tcl8.5/TclCmd/exec.htm for more details on the use of the exec
command and flow control.
Reading from a channel can be done either by using gets
or read
command. The first command reads a single line from a channel and can be invoked with only the channel name, in which case it returns string read from the channel. It can also be invoked with the channel name and variable name, in which case it reads a line, and that string is written to specified variable and a returns number of bytes read. The read
command can be used to either read all remaining data, if invoked with channel name only, or the specified number of bytes, if invoked with a channel name and number of bytes.
Writing to a channel can be performed using puts
command. The command can be invoked with only one argument, which causes the specified argument to be printed to standard output. It can also be invoked with a channel name and a second argument, which causes the second argument to be written to the channel. Specifying the -nonewline
option as the first argument causes puts
not to append a newline character at the end, which is done by default.
Often, a puts
command only adds data to a buffer associated with a channel and writes actual data to disk if the buffer is full. If a write should be performed immediately, the flush
command can be used. This command is invoked with the channel name as its only argument. Invoking the command will cause all buffers to be written. Buffering is discussed more in the section focusing on the fconfigure
command.
An example usage for flush
is to make sure data is sent to a channel. For example, in order to make sure text is written to standard output, we can do the following:
puts -nonewline "Enter your first name: " flush stdout gets stdin firstName
This command will print out Enter your first name: and then read user's input from standard input. As standard output is configured to buffer data, the text would not be printed without first invoking flush stdout
.
In order to move to the current location in a file, we can use the seek
command. This command accepts a channel as the first argument, followed by the position we want to move to. Optionally, we can also specify the origin of the offset. It can be either start, current
or end
; if skipped, it defaults to start
. In first case, the position indicates an offset from the beginning of a file. The current
position is relative to the current position in the file and end
means that offset is relative to end of the file. This makes sense only if the position is a non-positive number.
The eof
command is used to verify if end of the file has been reached for a channel. This is usually checked inside a while
statement, to iterate until EOF has been reached. All channels opened by open
command as well as all other channels in Tcl need to be closed with a close
command.
set chan [open "/tmp/myfile" w] puts $chan "Tcl version: [info tclversion]" puts -nonewline $chan "Additional line" puts $chan " of text" close $chan set chan [open "/tmp/myfile" r] while {![eof $chan]} { gets $chan line puts "Line: > $line <" } close $chan
The result would be as follows:
Tcl version: 8.5
Additional line of text
Tcl channels also are configurable, which means that they have standard and non-standard options that we can get or set. Typical functions include file encoding, newline handling, the End of File character, buffering, and blocking mode versus non-blocking mode. Non-standard options vary depending on the type of the channel—serial ports, network connections, and other channels have options specific to their type.
You can get and set all options using the fconfigure
command. If this command is run with only a channel name, it returns a list of all options along with their values. Running fconfigure
with a channel and option name returns this option's current value, and passing both the name and new value causes the option's value to be set to what we specified.
One of the most commonly used options is whether a channel is blocking or non-blocking, which is set by the option -blocking
and its value is a Boolean value. A blocking channel causes each attempt to read more data than currently available to block until that data is available—for example, a gets
command will return only after an entire line can be read. A non-blocking channel returns as much data as is currently available and does not wait for more. By default, channels are in blocking mode, so -blocking
is set to 1
.
Tcl channels work in such a way that each channel has an output buffer. It is used when writing data to a channel and depending on configuration, data can either be buffered or written to disk. Option -buffering
specifies type of buffering that will be done for this channel. If the value is full
, then Tcl will always buffer data and only write when its internal buffer is full. If it is set to line
, Tcl will write data to the channel after each newline character. Setting it to none
will always cause Tcl to write data to channel. By default, all channels have -buffering
set to full
, except for standard input, standard output, and standard error channels, which are set to line, line
, and none
respectively.
We can also set the size of the buffer that Tcl will use with the -buffersize
option. It specifies the maximum number of bytes that Tcl should use for the internal output buffer. It is also affected by the -buffering
option, which determines when Tcl writes data to the channel.
Tcl channels offer native translation of newline characters and different encodings. A channel can be configured as binary file or any encoding that Tcl handles and various modes for handling newline characters. By default, channels are configured to use system native encoding (which is determined depending on operating system and environment) and use native newline translation (CR LF on Windows, LF on Unix).
The -translation
option defines how newline translation is done and/or whether a file is in binary
mode. It can be one of the following values:
Translation mode |
Description |
---|---|
|
ASCII character 10 |
|
ASCII character 13 |
|
ASCII characters 13 followed by 10 |
|
Automatic detection (for input only) |
|
Binary translation of all data |
The binary option tells Tcl that it should ignore any translation (including encoding handling) and treat a channel as binary.
It is also possible to specify a list of two elements, where the first element means how newlines are read and the second one determines how newlines are written—for example, {auto crlf}
means that reading is determined automatically, but CR
and LF
is always used for writing. This translation value is the default for Tcl on Microsoft Windows.
In addition to this, the option -encoding
allows us to specify the encoding used for reading and writing to a channel. For example, in order to read a channel in UTF-8, we need to configure the channel to use utf-8
encoding. To work with file in UCS-2, we need to specify encoding unicode
. Available encoding names can be retrieved using the encoding names
command, which is explained in more detail in Chapter 5.
You can modify both the -translation
and -encoding
options as you are reading a file—so it is possible to read the first line of an XML file with utf-8
encoding and see whether it contains encoding information. If it does, we can change the encoding and read the remaining part of the file with the correct encoding. For example:
# open a file and read its first line set chan [open "/path/to/file.xml" r] fconfigure $chan translation auto encoding utf-8 gets $chan line # call a function that checks if first line is <?...?> # and if it is, returns the encoding we should use set encoding [getHeaderEncoding $line] if {$encoding == ""} { # if we found no <? ...?> line, let's read entire file seek $chan 0 start } else { fconfigure $chan -encoding $encoding } set xml [read $chan] close $chan # Now, let's parse $xml somehow (see chapter 5)
Internationalization related issues are explained in more detail in Chapter 5. For more information about encoding handling in Tcl, please see the corresponding manual page at: http://www.tcl.tk/man/tcl8.5/TclCmd/encoding.htm
For more details about all standard options, please see the fconfigure
command manual page at: http://www.tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm
Besides reading and writing to files, Tcl offers an additional command that aids in managing and working with files— the file
command. This command has multiple subcommands, which allow copying, renaming, deleting, getting and modifying information about files. One of main features that file
offers is ability to copy, rename, and delete files as well as directories.
The file copy
command can be used to copy one or more files or directories. The last argument to this command is the target and all previous arguments are items to copy. If only one item to copy is specified and the target does not exist, then the file or directory is copied as the target. For example:
file copy /etc/passwd /tmp/passwd
If /tmp/passwd
does not exist, then /etc/passwd
will be copied as /tmp/passwd
. If /tmp/passwd
is an existing file, the command will fail, unless -force
is specified as the first argument, as follows:
file copy force /etc/passwd /tmp/passwd
If multiple source items to copy are specified or the target is an existing directory, then all items will be copied into the target directory. In the example above, if /tmp/passwd
is a directory, then /etc/passwd
will be copied as /tmp/passwd/passwd
. Similar to previous case, unless -force
was specified, file copy
will fail in case of any existing files.
On systems that support symbolic links, file copy
handles them correctly, which means that they will be copied as links instead of copying the targets that the links point to.
The renaming of files and moving them to different directories can be achieved using the file rename
command. Similar to doing a file copy, the last argument is the target and all previous ones are source items that should be renamed or moved. If only one item is specified and the target does not exist, then the source item is renamed as the target. If multiple source items are specified or the target is an existing directory, all source items are moved to the target directory while preserving their name.
The file delete
command is used to delete files and/or directories. It takes one or more arguments, each of them being either a file or directory name. For example:
file delete /tmp/passwd
Tcl will not delete non-empty directories and will raise an error in this case. Deleting directories recursively requires -force
to be specified as the first argument. Using this option also causes Tcl to try and modify the permissions of items when permissions prevent them from being deleted. This, of course, is limited by the operating system, so Tcl won't be able to delete other users' files, unless it is run by an administrator. Tcl will ignore any attempt to delete a non-existent file, and it will not raise an error.
In order to create a directory, we need to invoke the file mkdir
command. For example:
file mkdir /tmp/some/new/directory
This command creates a directory as well as all parent directories that do not exist. So, even if the directory /tmp
was empty, Tcl will create /tmp/some
, then /tmp/some/new
, and finally /tmp/some/new/directory
appropriately.
The file
command also offers multiple commands for platform-dependant management of file names. This can be used to effectively manage filenames, and handle joining paths so that different path types and different path separators are properly handled. The file join
command joins all arguments into a full path. The file split
command does the opposite—splits a path into a list of elements. For Unix systems the separator is /, while for Windows both /
and
are acceptable. Tcl uses /
on both Windows and Unix when possible, though. On a Unix system, splitting and joining paths returns values similar to these:
% puts [file split /home/user/../tcluser/bin] / home user .. tcluser bin % puts [file join / home user .. tcluser bin] /home/user/../tcluser/bin % puts [file join / home user /home tcluser bin] /home/tcluser/bin
The last example shows that file join distinguishes paths that are absolute and if an element is an absolute path, the previous ones are discarded, similar to how all file accessing works. For Windows systems, the logic is a bit different to support multiple volumes—so C:/Tcl/bin/tclsh85.exe
would be split into the elements C:/ Tcl bin
and tclsh85.exe
.
Using the file
command we can also do typical path related activities in an easy way; file tail
provides a convenient way to get the file name only, which is an equivalent of doing a file split
and retrieving last element. We can also get the path to a parent directory by using the file dirname
, which is equivalent to splitting, then removing the last element from the list, and finally running file join
to get the path back. For example:
% puts [file tail /home/tcluser/bin]
bin
% puts [file dirname /home/tcluser/bin]
/home/tcluser
Similar to splitting and joining, these commands work regardless of whether the files and/or directories exist, and handle issues specific to the different operating systems.
Often, paths contain special entries such as .
. that indicate that a parent directory should be used. The file normalize
command can be used to normalize paths so that they are always full paths. In order to convert a path to use native file separator, we can use the file nativename
command. For example:
% puts [file normalize [file join C:/WINDOWS .. Tcl bin tclsh85.exe]]
C:/Tcl/bin/tclsh85.exe
% puts [file nativename C:/Tcl/bin/tclsh85.exe]
C:Tclin clsh85.exe
Tcl also makes it possible to retrieve file extensions. The command file extension
returns file's extension, along with last dot character preceding it. The command file rootname
returns the opposite—returns filename up to but not including the dot and all characters after it. For example:
% puts [file extension mybinary-1.1.exe]
.exe
% puts [file rootname mybinary-1.1.exe]
mybinary-1.1
The additional subcommands of the file
command are available to gather information about files. In order to get all of the information about a specified file or directory, you need to use the file stat
command appending a path to an item and an array name as arguments. Information about the specified file will be set in the specified array.
The following elements of the array are set:
Key |
Description |
---|---|
|
Type of entry—file or |
|
For files—size of file in bytes; for directories, value depends on operating system |
|
Last access time, as unix timestamp |
|
File creation time, as unix timestamp |
|
File modification time, as unix timestamp |
|
Unix group identifier owning the file |
|
Unix user identifier owning the file |
|
Unix file permissions |
Values for the keys gid, uid
, and mode
are specific to Unix systems and are set to reasonable defaults on systems that do not support Unix file permissions. The owning user and group are specified as their integer identifiers.
For example, on Unix, this command would return information similar to following:
% file stat /etc/passwd myarray
% puts "Type=$myarray(type) Size=$myarray(size)"
Type=file Size=987
On Unix, performing file stat
command on a symbolic link will return information about item that the symbolic link refers to. In order to access information about the actual symbolic link, the file lstat
command should be used.
Symbolic links can also be read and created using the file readlink
and file link
commands accordingly. The first one returns information about the target a link points to for symbolic links and throws an error if either the file/directory does not exist or is not a symbolic link. The command file link
creates a link, either symbolic or hard. It needs to be invoked with the new item to be created as the first argument and the source element as the second argument. In order to make a symbolic link the -symbolic
flag needs to be provided before arguments. To create a hard link, the -hard
flag needs to be provided. An attempt to create a link that is not supported on a particular operating system will raise an error. Currently, Unix platforms support symbolic links to files and directories and hard links to files. Modern Microsoft Windows systems allow symbolic links to directories and hard links to files on the NTFS filesystem. These are not done as Windows shortcuts (*.lnk
files) and Tcl treats shortcut file as any other file.
For example, the following commands can create and read symbolic links on Unix:
% puts [file link -symbolic /tmp/passwd /etc/passwd]
/tmp/passwd
% puts [file readlink /tmp/passwd]
/etc/passwd
Getting and modifying the last accessed and last modification date can be done using the file atime
and file mtime
commands. When run with just a path as the first argument they return the appropriate time, as a Unix timestamp. When both a path and new value are specified, the appropriate time is set to a new value. Please note that not all operating systems and filesystems support setting this value and not all systems track it with the same granularity—in some cases, the new value will not be the same as the value that was set by the script.
Tcl also supports getting and setting operating system specific information about files and directories, using the file attributes
command. When run with only the path to a file or directory, it returns a list of attribute name and value pairs, which can be used to retrieve all available attributes for an item. When run with a path and attribute name, it returns the current value for that attribute for the specified file or directory. When run with a path, attribute name and value, it sets a new value for that attribute for specified file or directory. For example, on Unix systems, we can work with ownership and permissions using this command:
% file attributes /tmp/passwd
-group tcluser -owner tcluser -permissions 00644
% file attributes /tmp/passwd -group admin -permissions 0660
% file attributes /tmp/passwd -permissions
00660
Tcl can also provide information about read/write access to particular items. The commands file readable, file writable
return whether a particular file or directory can be read from or written to. For directories, this returns information about the ability to access files within that directory and the creation of new directories in that directory. In addition to this, the commands file isfile
and file isdirectory
can be used to check whether an item is a file or a directory.
Listing items in a filesystem can be achieved by using the glob
command. Each argument can be specified as a pattern of items to match or options that apply that to further patterns. The command returns a list containing all matched items in a filesystem or, by default, throws an error if no matches were found with any of the patterns.
Patterns can be specified as any character that the filename needs to match or a special character—? means any single character; *
means zero or more characters. {ab,cd}
matches any of the strings inside braces, split by a comma; in this case, it matches either a string containing ab
or cd. [abcx-z]
means any character inside the brackets, where x-z
means any characters between x
and z
, inclusive; in this case, it matches a, b, c, x, y
, or z
. The form x
matches the character x
and can be used to escape characters such as braces or brackets. Patterns can work with multiple levels of directories, for example by doing */*.tcl
which will match all files with .tcl
extension in all sub-directories of current directory.
The following will return similar information on many Unix systems:
% puts [glob /etc/pass*]
/etc/passwd /etc/passwd.bak
Specifying the base directory for matching patterns can be done using the -directory
flag. This causes all further patterns to be evaluated relative to the specified directory. For example:
% puts [glob -directory /etc pass*]
/etc/passwd /etc/passwd.bak
Passing the flag -join
causes all remaining arguments to be treated as one argument and to be joined in the same way as file join
does. For example:
% puts [glob -join /usr bin *grep]
/usr/bin/egrep /usr/bin/grep
When the -directory
option was specified, adding the -tails
option as well causes the results list to contain only paths relative to one of the options. For -directory
, it causes only paths relative to this option's value to be returned, and for -path
, it causes the last element of the -path
option to be appended as the first element in all paths returned. For example:
% puts [glob -tails -directory /etc pass*]
passwd passwd.bak
We can also look for specific types of entries by using the -types
option. It accepts a list of one or more file types. When multiple types are passed, glob
looks up entries that are of any of the specified types. We can pass the d
type to find directories, or f
to find files. In addition, on Unix systems, we can use l
for a symbolic link, p
for a named pipe, and s
for Unix sockets. In addition, b
and c
types can be used to find block and character device entries, used to access devices on Unix systems.
We can also look for specific types of entries by using the -types
option. It accepts a list of one or more file types. When multiple types are passed, glob
looks up entries that are of any of the specified types. The following values can be used:
Type |
Description |
---|---|
|
File |
|
Directory |
|
Symbolic link |
|
Named pipe |
|
Unix socket |
|
Block device |
|
Character device |
The values l, p, s, b
, and c
are only used on Unix systems that provide support for those kinds of file types.
In order to find symbolic links and directories in /tmp
, we can do the following:
% puts [glob -type {d l} /tmp/*]
/tmp/hsperfdata_root /tmp/vmware-root /tmp/passwd
Besides file types, this option also accepts the permissions that a file needs to have. In case multiple access rights are specified, only entries having all of the access rights are returned. This can be r
for readable, w
for writable, x
for executable, and hidden
for hidden files on Microsoft Windows.
In order to find writable and executable files in /tmp
, we can do the following:
% puts [glob -type {f w x} /tmp/*]
/tmp/sess1248
The package fileutil
provides more high-level functionalities, such as recursive file lookups, searching for text within files, and so forth. It is available as part of the tcllib
package, which is delivered with ActiveTcl installations. Please visit http://tcllib.sourceforge.net/doc/fileutil.html for more details on its available functionality.
All applications have the concept of the current working directory. This is the directory that is used as the base for any path that we specify—for example, if our working directory is /tmp
, then opening the file tempfile
will cause /tmp/tempfile
to be opened. Specifying a full path—such as opening /etc/passwd
will always the cause proper file to be opened. When starting the application, this is usually the place where the program running our application was. In many cases, our application will want to get current working directory or change it according to its needs.
Changing the current working directory in Tcl can be done using the cd
command. It accepts one argument which specifies the new directory name. The command pwd
can be used to return the full path to current working directory. For example:
% set oldwd [pwd]
% cd ~
% puts [pwd]
/home/tcluser
% cd $oldwd
This example stores the current directory, goes to the user's home directory, prints it, and returns to the original directory.
3.22.74.160