Accessing files

Tcl allows us to read from and write to files by using channels. A channel can be an open file, but it can also be a network socket, a pipe, or any other channel type. Depending on the type of the channel, it can support reading from it, writing to it, or both.

Tcl comes with three default channels—stdin, stdout, and stderr. These channels correspond to the standard input, standard output, and standard error channels of operating systems. Standard input can be used to read information from the user, standard output should be used to write information to user, and standard error is used to write errors. Depending on how our application is run, these channels can be redirected from/to files.

In the case of Microsoft Windows and applications using the GUI version of Tcl, invoked using the wish or tclkit commands, standard channels are not available. This is because graphical applications in Microsoft Windows do not have standard consoles. In such cases, an equivalent of these channels is created that allows interacting with the user from Tk console window. For example:

Accessing files

The puts command in this form is described in more detail later in this section.

Reading and writing files

The open command can be used in order to open a file for reading and/or writing. It can be invoked either with just the filename, the filename and the open mode as well as filename, open mode and permissions, when creating a new file. Permissions are ignored on Microsoft Windows systems and default to 0666 if they are not specified on all other systems. Permissions are combined with the mask set up for current process, usually set by the umask system command. This is similar to how all file creation operations work on Unix systems. The open command returns the name of the newly opened channel—it can be used for all commands that operate on channels.

The open mode is a string specifying the access mode, defaulting to r if it is not specified. Mode r opens the file for reading only. Mode r+ opens file for reading and writing; the file must already exist. The w mode opens the file for writing only, truncating it if the file already exists, and creating it if it does not exist. The w+ mode opens the file in same mode as w mode, but allows both reading and writing. The a mode opens the file for writing only, setting the current pointer in file to end if the file already exists, and creating the file if it does not exist. The a+ mode opens the file in same mode as a mode, but allows both reading and writing.

The following table summarizes each of the modes, which operations are permitted, and any additional activity that takes place:

Mode

Readable

Writable

Initial position

Must exist

Is truncated

r

Yes

No

Beginning of file

Yes

No

r+

Yes

Yes

Beginning of file

Yes

No

w

No

Yes

Beginning of file

No

Yes

w+

Yes

Yes

Beginning of file

No

Yes

a

No

Yes

End of file

No

No

a+

Yes

Yes

End of file

No

No

The columns "Readable" and "Writable" describe which operations can be performed on a file. "Initial position" specifies whether the initial position after opening the file is its beginning or end. "Must exist" specifies whether the file has to exist at the point when open command is called. "Is truncated" specifies whether a file is created as an empty file if it already exists when open command is called.

The open command can also be used to run external processes. In this case, the first argument should be command name prefixed by a pipe character (|). An additional argument might specify the open mode, otherwise Tcl will open the application in read-only mode. For example:

set chan [open "|route"]
while {![eof $chan]} {
gets $chan line
puts "Route information: > $line <"
}
close $chan

The commands gets and eof used to read information are described in more detail later in this chapter.

They can be used to parse information from external commands as well as to interact with a text-based application or command. In order to write to an application, we need to pass r+ as the mode to the open command. Then we can use gets to read a channel and use puts to write to it.

In many cases, it is enough to use exec command, which allows running processes and reading their output. It accepts system command and its parameters as arguments to the command. By default, exec returns standard output as result of the command and if anything is written to standard error, exec throws an error with this data. For example, we can rework the preceding example to a simpler one by using exec:

foreach line [split [exec "route"] 
] {
puts "Route information: > $line <"
}

The split command converts output to a list of lines that the route system command writes to standard output.

The exec as well as open commands are used for running a system command; streaming one command to another command handles majority of shell-like syntax for redirecting and conditional execution. For example, we can redirect input, output, or error streams and can redirect them in a way similar to shell commands. The following example runs ifconfig and grep to return only lines with IP addresses, which can then be parsed.

set text [exec /sbin/ifconfig 2>/dev/null | grep "inet addr"]

The exec command also throws an error if a child exits with non-zero code. For example, we can try to run the command route with invalid parameters:

exec false

On Unix systems, the false command always exits with code 1. This command would cause an error child process that is exited abnormally to be thrown.

Please see http://www.tcl.tk/man/tcl8.5/TclCmd/exec.htm for more details on the use of the exec command and flow control.

Reading from a channel can be done either by using gets or read command. The first command reads a single line from a channel and can be invoked with only the channel name, in which case it returns string read from the channel. It can also be invoked with the channel name and variable name, in which case it reads a line, and that string is written to specified variable and a returns number of bytes read. The read command can be used to either read all remaining data, if invoked with channel name only, or the specified number of bytes, if invoked with a channel name and number of bytes.

Writing to a channel can be performed using puts command. The command can be invoked with only one argument, which causes the specified argument to be printed to standard output. It can also be invoked with a channel name and a second argument, which causes the second argument to be written to the channel. Specifying the -nonewline option as the first argument causes puts not to append a newline character at the end, which is done by default.

Often, a puts command only adds data to a buffer associated with a channel and writes actual data to disk if the buffer is full. If a write should be performed immediately, the flush command can be used. This command is invoked with the channel name as its only argument. Invoking the command will cause all buffers to be written. Buffering is discussed more in the section focusing on the fconfigure command.

An example usage for flush is to make sure data is sent to a channel. For example, in order to make sure text is written to standard output, we can do the following:

puts -nonewline "Enter your first name: "
flush stdout
gets stdin firstName

This command will print out Enter your first name: and then read user's input from standard input. As standard output is configured to buffer data, the text would not be printed without first invoking flush stdout.

In order to move to the current location in a file, we can use the seek command. This command accepts a channel as the first argument, followed by the position we want to move to. Optionally, we can also specify the origin of the offset. It can be either start, current or end; if skipped, it defaults to start. In first case, the position indicates an offset from the beginning of a file. The current position is relative to the current position in the file and end means that offset is relative to end of the file. This makes sense only if the position is a non-positive number.

The eof command is used to verify if end of the file has been reached for a channel. This is usually checked inside a while statement, to iterate until EOF has been reached. All channels opened by open command as well as all other channels in Tcl need to be closed with a close command.

set chan [open "/tmp/myfile" w]
puts $chan "Tcl version: [info tclversion]"
puts -nonewline $chan "Additional line"
puts $chan " of text"
close $chan
set chan [open "/tmp/myfile" r]
while {![eof $chan]} {
gets $chan line
puts "Line: > $line <"
}
close $chan

The result would be as follows:

Tcl version: 8.5
Additional line of text

Tcl channels also are configurable, which means that they have standard and non-standard options that we can get or set. Typical functions include file encoding, newline handling, the End of File character, buffering, and blocking mode versus non-blocking mode. Non-standard options vary depending on the type of the channel—serial ports, network connections, and other channels have options specific to their type.

Configuring channel options

You can get and set all options using the fconfigure command. If this command is run with only a channel name, it returns a list of all options along with their values. Running fconfigure with a channel and option name returns this option's current value, and passing both the name and new value causes the option's value to be set to what we specified.

One of the most commonly used options is whether a channel is blocking or non-blocking, which is set by the option -blocking and its value is a Boolean value. A blocking channel causes each attempt to read more data than currently available to block until that data is available—for example, a gets command will return only after an entire line can be read. A non-blocking channel returns as much data as is currently available and does not wait for more. By default, channels are in blocking mode, so -blocking is set to 1.

Tcl channels work in such a way that each channel has an output buffer. It is used when writing data to a channel and depending on configuration, data can either be buffered or written to disk. Option -buffering specifies type of buffering that will be done for this channel. If the value is full, then Tcl will always buffer data and only write when its internal buffer is full. If it is set to line, Tcl will write data to the channel after each newline character. Setting it to none will always cause Tcl to write data to channel. By default, all channels have -buffering set to full, except for standard input, standard output, and standard error channels, which are set to line, line, and none respectively.

We can also set the size of the buffer that Tcl will use with the -buffersize option. It specifies the maximum number of bytes that Tcl should use for the internal output buffer. It is also affected by the -buffering option, which determines when Tcl writes data to the channel.

Tcl channels offer native translation of newline characters and different encodings. A channel can be configured as binary file or any encoding that Tcl handles and various modes for handling newline characters. By default, channels are configured to use system native encoding (which is determined depending on operating system and environment) and use native newline translation (CR LF on Windows, LF on Unix).

The -translation option defines how newline translation is done and/or whether a file is in binary mode. It can be one of the following values:

Translation mode

Description

lf

ASCII character 10

cr

ASCII character 13

crlf

ASCII characters 13 followed by 10

auto

Automatic detection (for input only)

binary

Binary translation of all data

The binary option tells Tcl that it should ignore any translation (including encoding handling) and treat a channel as binary.

It is also possible to specify a list of two elements, where the first element means how newlines are read and the second one determines how newlines are written—for example, {auto crlf} means that reading is determined automatically, but CR and LF is always used for writing. This translation value is the default for Tcl on Microsoft Windows.

In addition to this, the option -encoding allows us to specify the encoding used for reading and writing to a channel. For example, in order to read a channel in UTF-8, we need to configure the channel to use utf-8 encoding. To work with file in UCS-2, we need to specify encoding unicode. Available encoding names can be retrieved using the encoding names command, which is explained in more detail in Chapter 5.

You can modify both the -translation and -encoding options as you are reading a file—so it is possible to read the first line of an XML file with utf-8 encoding and see whether it contains encoding information. If it does, we can change the encoding and read the remaining part of the file with the correct encoding. For example:

# open a file and read its first line
set chan [open "/path/to/file.xml" r]
fconfigure $chan translation auto encoding utf-8
gets $chan line
# call a function that checks if first line is <?...?>
# and if it is, returns the encoding we should use
set encoding [getHeaderEncoding $line]
if {$encoding == ""} {
# if we found no <? ...?> line, let's read entire file
seek $chan 0 start
} else {
fconfigure $chan -encoding $encoding
}
set xml [read $chan]
close $chan
# Now, let's parse $xml somehow (see chapter 5)

Internationalization related issues are explained in more detail in Chapter 5. For more information about encoding handling in Tcl, please see the corresponding manual page at: http://www.tcl.tk/man/tcl8.5/TclCmd/encoding.htm

For more details about all standard options, please see the fconfigure command manual page at: http://www.tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm

File management

Besides reading and writing to files, Tcl offers an additional command that aids in managing and working with files— the file command. This command has multiple subcommands, which allow copying, renaming, deleting, getting and modifying information about files. One of main features that file offers is ability to copy, rename, and delete files as well as directories.

The file copy command can be used to copy one or more files or directories. The last argument to this command is the target and all previous arguments are items to copy. If only one item to copy is specified and the target does not exist, then the file or directory is copied as the target. For example:

file copy /etc/passwd /tmp/passwd

If /tmp/passwd does not exist, then /etc/passwd will be copied as /tmp/passwd. If /tmp/passwd is an existing file, the command will fail, unless -force is specified as the first argument, as follows:

file copy force /etc/passwd /tmp/passwd

If multiple source items to copy are specified or the target is an existing directory, then all items will be copied into the target directory. In the example above, if /tmp/passwd is a directory, then /etc/passwd will be copied as /tmp/passwd/passwd. Similar to previous case, unless -force was specified, file copy will fail in case of any existing files.

On systems that support symbolic links, file copy handles them correctly, which means that they will be copied as links instead of copying the targets that the links point to.

The renaming of files and moving them to different directories can be achieved using the file rename command. Similar to doing a file copy, the last argument is the target and all previous ones are source items that should be renamed or moved. If only one item is specified and the target does not exist, then the source item is renamed as the target. If multiple source items are specified or the target is an existing directory, all source items are moved to the target directory while preserving their name.

The file delete command is used to delete files and/or directories. It takes one or more arguments, each of them being either a file or directory name. For example:

file delete /tmp/passwd

Tcl will not delete non-empty directories and will raise an error in this case. Deleting directories recursively requires -force to be specified as the first argument. Using this option also causes Tcl to try and modify the permissions of items when permissions prevent them from being deleted. This, of course, is limited by the operating system, so Tcl won't be able to delete other users' files, unless it is run by an administrator. Tcl will ignore any attempt to delete a non-existent file, and it will not raise an error.

In order to create a directory, we need to invoke the file mkdir command. For example:

file mkdir /tmp/some/new/directory

This command creates a directory as well as all parent directories that do not exist. So, even if the directory /tmp was empty, Tcl will create /tmp/some, then /tmp/some/new, and finally /tmp/some/new/directory appropriately.

Filename related operations

The file command also offers multiple commands for platform-dependant management of file names. This can be used to effectively manage filenames, and handle joining paths so that different path types and different path separators are properly handled. The file join command joins all arguments into a full path. The file split command does the opposite—splits a path into a list of elements. For Unix systems the separator is /, while for Windows both / and are acceptable. Tcl uses / on both Windows and Unix when possible, though. On a Unix system, splitting and joining paths returns values similar to these:

% puts [file split /home/user/../tcluser/bin]

/ home user .. tcluser bin
% puts [file join / home user .. tcluser bin]

/home/user/../tcluser/bin
% puts [file join / home user /home tcluser bin]

/home/tcluser/bin

The last example shows that file join distinguishes paths that are absolute and if an element is an absolute path, the previous ones are discarded, similar to how all file accessing works. For Windows systems, the logic is a bit different to support multiple volumes—so C:/Tcl/bin/tclsh85.exe would be split into the elements C:/ Tcl bin and tclsh85.exe.

Using the file command we can also do typical path related activities in an easy way; file tail provides a convenient way to get the file name only, which is an equivalent of doing a file split and retrieving last element. We can also get the path to a parent directory by using the file dirname, which is equivalent to splitting, then removing the last element from the list, and finally running file join to get the path back. For example:

% puts [file tail /home/tcluser/bin]
bin
% puts [file dirname /home/tcluser/bin]
/home/tcluser

Similar to splitting and joining, these commands work regardless of whether the files and/or directories exist, and handle issues specific to the different operating systems.

Often, paths contain special entries such as .. that indicate that a parent directory should be used. The file normalize command can be used to normalize paths so that they are always full paths. In order to convert a path to use native file separator, we can use the file nativename command. For example:

% puts [file normalize [file join C:/WINDOWS .. Tcl bin tclsh85.exe]]
C:/Tcl/bin/tclsh85.exe
% puts [file nativename C:/Tcl/bin/tclsh85.exe]
C:Tclin	clsh85.exe

Tcl also makes it possible to retrieve file extensions. The command file extension returns file's extension, along with last dot character preceding it. The command file rootname returns the opposite—returns filename up to but not including the dot and all characters after it. For example:

% puts [file extension mybinary-1.1.exe]
.exe
% puts [file rootname mybinary-1.1.exe]
mybinary-1.1

File information

The additional subcommands of the file command are available to gather information about files. In order to get all of the information about a specified file or directory, you need to use the file stat command appending a path to an item and an array name as arguments. Information about the specified file will be set in the specified array.

The following elements of the array are set:

Key

Description

type

Type of entry—file or directory

size

For files—size of file in bytes; for directories, value depends on operating system

atime

Last access time, as unix timestamp

ctime

File creation time, as unix timestamp

mtime

File modification time, as unix timestamp

gid

Unix group identifier owning the file

uid

Unix user identifier owning the file

mode

Unix file permissions

Values for the keys gid, uid, and mode are specific to Unix systems and are set to reasonable defaults on systems that do not support Unix file permissions. The owning user and group are specified as their integer identifiers.

For example, on Unix, this command would return information similar to following:

% file stat /etc/passwd myarray
% puts "Type=$myarray(type) Size=$myarray(size)"
Type=file Size=987

On Unix, performing file stat command on a symbolic link will return information about item that the symbolic link refers to. In order to access information about the actual symbolic link, the file lstat command should be used.

Symbolic links can also be read and created using the file readlink and file link commands accordingly. The first one returns information about the target a link points to for symbolic links and throws an error if either the file/directory does not exist or is not a symbolic link. The command file link creates a link, either symbolic or hard. It needs to be invoked with the new item to be created as the first argument and the source element as the second argument. In order to make a symbolic link the -symbolic flag needs to be provided before arguments. To create a hard link, the -hard flag needs to be provided. An attempt to create a link that is not supported on a particular operating system will raise an error. Currently, Unix platforms support symbolic links to files and directories and hard links to files. Modern Microsoft Windows systems allow symbolic links to directories and hard links to files on the NTFS filesystem. These are not done as Windows shortcuts (*.lnk files) and Tcl treats shortcut file as any other file.

For example, the following commands can create and read symbolic links on Unix:

% puts [file link -symbolic /tmp/passwd /etc/passwd]
/tmp/passwd
% puts [file readlink /tmp/passwd]
/etc/passwd

Getting and modifying the last accessed and last modification date can be done using the file atime and file mtime commands. When run with just a path as the first argument they return the appropriate time, as a Unix timestamp. When both a path and new value are specified, the appropriate time is set to a new value. Please note that not all operating systems and filesystems support setting this value and not all systems track it with the same granularity—in some cases, the new value will not be the same as the value that was set by the script.

Tcl also supports getting and setting operating system specific information about files and directories, using the file attributes command. When run with only the path to a file or directory, it returns a list of attribute name and value pairs, which can be used to retrieve all available attributes for an item. When run with a path and attribute name, it returns the current value for that attribute for the specified file or directory. When run with a path, attribute name and value, it sets a new value for that attribute for specified file or directory. For example, on Unix systems, we can work with ownership and permissions using this command:

% file attributes /tmp/passwd
-group tcluser -owner tcluser -permissions 00644
% file attributes /tmp/passwd -group admin -permissions 0660
% file attributes /tmp/passwd -permissions
00660

Tcl can also provide information about read/write access to particular items. The commands file readable, file writable return whether a particular file or directory can be read from or written to. For directories, this returns information about the ability to access files within that directory and the creation of new directories in that directory. In addition to this, the commands file isfile and file isdirectory can be used to check whether an item is a file or a directory.

Listing files

Listing items in a filesystem can be achieved by using the glob command. Each argument can be specified as a pattern of items to match or options that apply that to further patterns. The command returns a list containing all matched items in a filesystem or, by default, throws an error if no matches were found with any of the patterns.

Patterns can be specified as any character that the filename needs to match or a special character—? means any single character; * means zero or more characters. {ab,cd} matches any of the strings inside braces, split by a comma; in this case, it matches either a string containing ab or cd. [abcx-z] means any character inside the brackets, where x-z means any characters between x and z, inclusive; in this case, it matches a, b, c, x, y, or z. The form x matches the character x and can be used to escape characters such as braces or brackets. Patterns can work with multiple levels of directories, for example by doing */*.tcl which will match all files with .tcl extension in all sub-directories of current directory.

The following will return similar information on many Unix systems:

% puts [glob /etc/pass*]
/etc/passwd /etc/passwd.bak

Specifying the base directory for matching patterns can be done using the -directory flag. This causes all further patterns to be evaluated relative to the specified directory. For example:

% puts [glob -directory /etc pass*]
/etc/passwd /etc/passwd.bak

Passing the flag -join causes all remaining arguments to be treated as one argument and to be joined in the same way as file join does. For example:

% puts [glob -join /usr bin *grep]
/usr/bin/egrep /usr/bin/grep

When the -directory option was specified, adding the -tails option as well causes the results list to contain only paths relative to one of the options. For -directory, it causes only paths relative to this option's value to be returned, and for -path, it causes the last element of the -path option to be appended as the first element in all paths returned. For example:

% puts [glob -tails -directory /etc pass*]
passwd passwd.bak

We can also look for specific types of entries by using the -types option. It accepts a list of one or more file types. When multiple types are passed, glob looks up entries that are of any of the specified types. We can pass the d type to find directories, or f to find files. In addition, on Unix systems, we can use l for a symbolic link, p for a named pipe, and s for Unix sockets. In addition, b and c types can be used to find block and character device entries, used to access devices on Unix systems.

We can also look for specific types of entries by using the -types option. It accepts a list of one or more file types. When multiple types are passed, glob looks up entries that are of any of the specified types. The following values can be used:

Type

Description

f

File

d

Directory

l

Symbolic link

p

Named pipe

s

Unix socket

b

Block device

c

Character device

The values l, p, s, b, and c are only used on Unix systems that provide support for those kinds of file types.

In order to find symbolic links and directories in /tmp, we can do the following:

% puts [glob -type {d l} /tmp/*]
/tmp/hsperfdata_root /tmp/vmware-root /tmp/passwd

Besides file types, this option also accepts the permissions that a file needs to have. In case multiple access rights are specified, only entries having all of the access rights are returned. This can be r for readable, w for writable, x for executable, and hidden for hidden files on Microsoft Windows.

In order to find writable and executable files in /tmp, we can do the following:

% puts [glob -type {f w x} /tmp/*]
/tmp/sess1248

The package fileutil provides more high-level functionalities, such as recursive file lookups, searching for text within files, and so forth. It is available as part of the tcllib package, which is delivered with ActiveTcl installations. Please visit http://tcllib.sourceforge.net/doc/fileutil.html for more details on its available functionality.

Current working directory

All applications have the concept of the current working directory. This is the directory that is used as the base for any path that we specify—for example, if our working directory is /tmp, then opening the file tempfile will cause /tmp/tempfile to be opened. Specifying a full path—such as opening /etc/passwd will always the cause proper file to be opened. When starting the application, this is usually the place where the program running our application was. In many cases, our application will want to get current working directory or change it according to its needs.

Changing the current working directory in Tcl can be done using the cd command. It accepts one argument which specifies the new directory name. The command pwd can be used to return the full path to current working directory. For example:

% set oldwd [pwd]
% cd ~
% puts [pwd]
/home/tcluser
% cd $oldwd

This example stores the current directory, goes to the user's home directory, prints it, and returns to the original directory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.74.160