Transferring files and data

Transferring files over the Internet is one of the most common things that networked applications do. In this section, we'll focus on two very popular protocols—HyperText Transfer Protocol (HTTP) and File Transfer Protocol (FTP). The first one is used to download websites, images, or other files from the Web and second one is used to download or upload files. There are subtle differences between the two and we'll need to go into a bit more detail to understand them.

HTTP is a for retrieving a single item over network. It allows us to retrieve a single element and is a lightweight protocol. It also allows us to handle large numbers of requests and is the protocol used for serving websites to web browsers. HTTP is used for both static files (such as HTML pages, images, downloads, and so on) and dynamic files (such as PHP or Tcl scripts building pages when users access them).

FTP protocol on the other hand is designed to transfer files over a network. It features authentication, and offers a lot of features that are specific to file management—creating directories, renaming and deleting items, listing contents of a directory, and the idea of a working directory. FTP is also a more heavyweight protocol and is less commonly used for offering downloads to wide audience.

Resources and uri package

Specifying locations of a resource on the Internet is done by specifying a URL—Unique Resource Location. It consists of a protocol, an optional username, a password, a hostname, and a port followed by the path to the resource. For example, a URL could look like http://wiki.tcl.tk/tcllib

Tcl offers the uri package, which is part of Tcllib, and can be used to split and join URLs from parts. These parts include scheme, user, pwd, host, port, path, query, and fragment. Not all of these are always present in all types of URLs. The first one is the only key only present and defines protocol that is used—for example, http and ftp. Credentials are optional and are specified as user and pwd. The parts host and port specify the hostname and port to connect to; port can be empty, which means it is the default port for specified protocol. The location of the resource is specified as path, and query is an optional part that defines a query sent via the URL (mainly for http requests); fragment points to a fragment of a page and is also used only for the HTTP protocol.

Currently, ftp, http, https, file, mailto, and news protocols are supported.

We can split a URL into elements using the uri::split command. It returns one or more name-value pairs which represent each part. For example, we can do the following:

set uridata [uri::split "http://wiki.tcl.tk/tcllib"]
foreach name {scheme user pwd host port path query fragment} {
if {[dict exists $uridata $name]} {
puts "$name = [dict get $uridata $name]"
}
}

This will print the following result:

scheme = http
user =
pwd =
host = wiki.tcl.tk
port =
path = tcllib
query =

We can also create a URL by specifying various parts using the uri::join command. It takes all arguments and parses them as name-value pairs, specifying parts of the address it should generate. For example:

puts [uri::join 
scheme http host www.packtpub.com port 80 path books 
]

The preceding code will print out the address http://www.packtpub.com/books. Please note that port part was skipped, because 80 is the default port for the HTTP protocol.

We can also use the result from splitting and join it back by running:

puts [uri::join {*}[uri::split "http://www.google.com"]]

This will split the address into parts and pass that to the uri::join command—{*} will cause all elements of the list to be appended as separate arguments, because the command expects it.

Note

The sample code shown in this section is located in the 06uri directory in the source code examples for this chapter.

More information about the uri package can be found in its documentation available at: http://tcllib.sourceforge.net/doc/uri.html

Using HTTP

HTTP is a stateless protocol that uses a simple request-response message exchange pattern. This means that whenever a client, such as our application, wants to access a particular resource, it sends a HTTP request. The server then processes it and sends back a response, usually being the requested response or information that it could not be found or an error occurred.

HTTP works by sending a request to the server. A request describes whether we are getting information or sending data to server, the path to the resource, and the version of the protocol we're using. A request also consists of several headers, which are name-value pairs and can either be standard or custom ones. Finally, a request can also provide data that we are uploading to server.

Retrieving data over HTTP

After receiving and parsing the request, the web server returns the response. The response consists of a status line, one or more headers, and the body of the response. After the response is sent, the current connection is either closed or reused for the next request. However, from the HTTP perspective, each of these requests is treated independently.

Tcl comes with a http package built-in. This package offers a basic, but complete HTTP client which can be used to perform both basic and more advanced operations. The command http::geturl is used to initiate or perform a request and is the starting point for performing HTTP operations. This command accepts a URL followed by one or more options.

A commonly used option is -binary, which specifies whether the transfer should be done in binary mode and defaults to false. By default, Tcl does newlines and encoding conversions for text documents—therefore, if a server sends out some HTML in UTF-8, Tcl converts that to proper string. If the -binary option is enabled, this is not performed and all types of documents are retrieved as bytes, not depending on whether it is a text document or not.

The http::geturl command always returns a token that can be used to get information and data related to this query. For example, in order to get the contents of Google's main page, we can simply run:

package require http
set token [http::geturl "http://www.google.com/"]
puts [http::data $token]

The second line receives the request and returns token that we can use later on. The command http::data returns body of server's response, which we then print to standard output.

The command http::cleanup should be used after we are done working with a request. It will clean up all resources used for this request. For example:

http::cleanup $token

We can also save contents of the response directly to any open channel. For example:

set fh [open "google-index.html" w]
set token [http::geturl "http://www.google.com/" -channel $fh]
http::cleanup $token
close $fh

This will cause a response to be written to the specified channel and not stored in memory. This can be used for downloading large files and/or if you plan to save the contents of the response to a file.

There are important things that our previous example is missing, for example, checking for errors. Although the http package will throw an error if the web server is unreachable, there are cases when a web server will send a response stating that a resource is unavailable or an error has occurred. In these cases, it is not translated into an error as this might be the desired response from our application's perspective.

We can check the status of handling the request by using the http::status command. It will return one of the following values:

Status

Description

ok

Indicates that the HTTP request was completed successfully

eof

Indicates that the server closed connection without replying

error

Indicates an error

If the status is error, we can also retrieve actual error message using the http::error command.

HTTP server sends status codes which specify the outcome of processing this request. We can retrieve the status code using the http::ncode command. Usually, it is sufficient to check if the code equals 200, which means that the request has been processed correctly.

The most frequently used status codes are:

Code

Description

200

The request has been successfully executed

206

The request has been successfully executed and result is partial content, used to download parts of a file over HTTP

301

Moved permanently—indicates that resource has been permanently moved to a new location; response header Location gives new location to resource

302

Moved temporarily—indicates that resource has been temporarily moved to a new location; response header Location gives new location to resource

401

Unauthorized—indicates that a server has requested the client to authenticate

403

Forbidden—this indicates that access to this resource is forbidden

404

Not found—indicates that a specified resource cannot be found

500

Internal error—indicates that there was a problem serving this request; for example, because HTTP server configuration is broken or module/script failed

For example, we can print out the status of our request by running:

switch -- [http::status $token] {
error {
puts "ERROR: [http::error $token]"
}
eof {
puts "EOF reading response"
}
ok {
puts "OK; code: [http::ncode $token]"
puts "Data:"
puts [http::data $token]
}
}

We can also get all headers from an HTTP response by using the http::meta command. It returns a list of name-value pairs that can be used as dictionaries or arrays. For example, to get contents of the Location header, we can do the following:

set code [http::ncode $token]
if {($code == 301) || ($code == 302)} {
set newURL [dict get [http::meta $token] Location]
# go to new location
}

Note

The complete example is located in the 07httprequest directory in the source code examples for this chapter.

Submitting information using GET and POST

We can also use the http package to submit information to a web server—for automating things such as filling in forms. Data from a form can be formatted using the http::formatQuery command. It can now be sent in two ways—either as part of the path in the URL or sending data as separate data. The first case is done using a GET request and an example is searching using Google, like http://www.google.com/search?q=tcl—the query is passed after ? character. The other approach is sending a POST request and the data is sent after the actual request.

POST is used for sending a larger amount of data and usually takes place when the request is modifying/sending data. GET is usually used for reading information as it can send smaller amount of data. POST requests are used for sending data, they can send much larger amount of data and POST requests are not cached by proxy servers.

For both GET and POST, data is sent as name-value pairs—q=tcl means value for field q is tcl. Multiple values are separated using& character. Tcl offers a command for generating such data, that is http::formatQuery. It accepts zero or more name-value pairs as arguments and formats proper query as an output.

Sending data using GET requires that we append the query to actual URL, for example:

set query [http::formatQuery q "Tcl Programming"]
set url "http://www.google.com/search?$query"

Sending POST data requires passing the data as a -query option to http::geturl. For example, we can do the following:

set query [http::formatQuery search "tcl programming"]
set url "http://www.packtpub.com/search"
set token [http::geturl $url -query $query]

This will cause a query to be sent as POST and data from -query will be sent.

By default, data is sent as encoded form data, it is also possible to send different query data. Usually, this is accompanied by sending the appropriate query type to the server. We can do this by adding the-type flag when sending the query. If a type is not specified, it defaults to application/x-www-form-urlencoded, which is the default MIME type for encoded form data. Many applications expecting XML or JavaScript Object Notaktion (JSON) data require that data sent in XML/JSON is sent with the appropriate MIME type in the headers of the request.

For example, we can send XML with accompanying data by doing:

http::geturl $url command onCompleteXMLPost 
-type "text/xml" 
-query [$dom asXML]

This will cause the appropriate value for the Content-Type header to be sent in the query. Details on XML and its handling can be found in Chapter 5 and show how to read and write documents.

Note

Examples related to basic HTTP functionality are placed in basic.tcl file in the 07httprequest directory in the source code examples for this chapter.

By default, http queries are done in a synchronous way, meaning that the http::geturl command returns after the command has been executed. In many cases, it is better to use an asynchronous approach, where the command exits instantly, uses events to process requests, and issues our callback, which is a Tcl command that will be run when the operation is completed.

The http package also offers advanced features such as passing additional headers to requests. This can be done by providing both the -timeout and -command options to the http::geturl command. In this case, the command returns immediately and returns a token that will be used. Accessing the data should be done from the command passed to the -command option. In asynchronous requests, http::geturl might still throw an error, for cases such as "no existing hostname". It is still recommended to catch such exceptions and handle them appropriately.

For example, in order to download the Google page asynchronously, we can do the following:

if {[catch {
set token [http::geturl "http://www.google.com/" 
-timeout 300000 -command doneGet]
} error]} {
puts stderr "Error while getting URL: $error"
}

Next we can create the command that will be invoked as the callback. It will be run with an additional parameter—the token of the request. For example, our command, based on previous examples, can be run as follows:

proc doneGet {token} {
switch -- [http::status $token] {
error {
puts "ERROR: [http::error $token]"
}
eof {
puts "EOF reading response"
}
ok {
puts "OK; code: [http::ncode $token]"
puts " Size: [http::size $token]"
puts " Data:"
puts [http::data $token]
}
}
http::cleanup $token
}

The token can be used in the same way as we used it with synchronous requests. We are also responsible for cleaning up the token, which is done in the last line of the example.

Note

Examples related to basic HTTP functionality are located in the async.tcl file in the 07httprequest directory in the source code examples for this chapter.

Advanced topics

The package http can also be used for more advanced features such as partial content downloading, sending cookies, and HTTP-level authorization.

The majority of these functions can be carried out using the -headers option passed to the http::geturl command. This option accepts a list of one or more name-value pairs. These can be any headers and values, but these should be headers that the server can understand. For example, we can use it to send cookie values to a site or authorize over HTTP for sites that use it.

There are two common ways that users are authorized within the Web—at the HTTP level and using HTML forms and cookies. The first one provides the username and password information as a HTTP header. The latter uses sending form data and cookies to track users, and is mainly related to handling cookies properly at the HTTP level.

For now, we'll focus on HTTP level authorization. A lot of web-based applications and data are protected using this mechanism. Let's assume we want to retrieve data from a specified URL. We need to connect to it without providing any credentials, and at this point, the server should include HTTP status 401. The following code would be a good start for checking if authorization is needed:

set token [http::geturl $url]
if {[http::status $token] != "ok"} {
puts stderr "Error while retrieving URL"
http::cleanup $token
exit 1
}
if {[http::ncode $token] == 401} {

If this condition is true, we should resend our request. The server will provide the WWW-Authenticate header in the response that will indicate the type of authentication and realm, which specifies the descriptive name of the resource we are currently trying to authenticate to. We can print it out by running:

set realm [dict get [http::meta $token] 
"WWW-Authenticate"]
puts "Authenticate information: $realm"

Next we need to clean up the previous request and send a new one with proper authentication information. Except for a few cases, the authentication type of Basic is used by HTTP servers. It requires sending a<username>:<password> string encoded as base64, preceded by the word Basic. We'll use the package base64 for this along with the base64::encode command:

package require base64
set authinfo [base64::encode ${username}:${password}]
set headers [list Authorization "Basic $authinfo"]

The second line contains the Authorization header to be sent to the server, along with the credentials as base64. Next we're sending a new request by doing:

set token [http::geturl $url -headers $headers]
if {[http::status $token] != "ok"} {
http::cleanup $token
puts stderr "Error while retrieving URL"
exit 1
}

We can then check if our current username and password were correct. If not, then the status for a new request will also be 401:

if {[http::ncode $token] == 401} {
puts stderr "Invalid username and/or password"
http::cleanup $token
exit 1
}

Note

An example related to basic authorization is located in the auth.tcl file in the 07httprequest directory in the source code examples for this chapter.

Cookies in Tcl

An additional feature that headers are useful for is supporting cookies. While the http package itself does not provide this functionality, it is easy to support this in the majority of cases. Standards for setting and getting cookies define expiration dates, and paths and domains that cookies should be valid for. However, in the majority of code that we write, it is enough to assume that the cookie you're getting is needed for all subsequent requests.

Cookies work in such a way that HTTP responses from servers may include one or more Set-Cookie headers. These headers need to be parsed and all cookies should be passed in the Cookie header. The server might send a response similar to this one:

Set-Cookie: mycookie=TEST0123; path=/
Set-Cookie: i=1; expires=Thu, 27-Oct-2011 11:07:24 GMT; path=/

This causes the cookie mycookie to be set to TEST0123 and i to be set to 1. Each subsequent request to this server should include the following header:

Cookie: mycookie=TEST0123; i=1

All changes to existing cookies overwrite them and new cookies cause a new value to be set, which is similar to behavior of arrays and dictionaries in Tcl. Writing code that handles cookies without taking parameters into account is relatively easy.

Let's start by writing a command that processes the HTTP response for cookies. We define the namespace for our code, the reference variable specified by user, and iterate over HTTP headers from the provided token:

namespace eval cookies {}
proc cookies::processCookies {varname token} {
upvar 1 $varname d
foreach {name value} [http::meta $token] {
if {[string equal -nocase $name "Set-Cookie"]} {

If the header is Set-Cookie, we process its value by taking part only until the first occurrence of a semi-colon and separating it into name and value using regular expressions:

set value [lindex [split $value ";"] 0]
if {[regexp "^(.*?)=(.*)$" $value 
- cname cvalue]} {
dict set d $cname $cvalue
}
}
}
}

This will cause the dictionary that is stored in the varname variable to be updated. Next, in all requests ,we need to pass all cookies. A small function to generate the appropriate value for the Cookies header would look like:

proc cookies::prepareCookies {var} {
set rc [list]
dict for {name value} $var {
lappend rc "$name=$value"
}
return [join $rc "; "]
}

Here we only take each cookie, append it, and join all cookies using a semi-colon followed by a space. In order to use this to query Tcler's wiki we can do the following:

set c [dict create]
set h [http::geturl http://wiki.tcl.tk/]
cookies::processCookies c $h
http::cleanup $h
set query [http::formatQuery _charset_ utf-8 S cookie]
set h [http::geturl http://wiki.tcl.tk/_/search?$query 
-headers [list Cookie [cookies::prepareCookies $c]] 
]

The first request gets the main page of the Wiki, which causes a cookie to be set. We need to pass this cookie to the second request in order to be able to perform a search. In this case, we're searching for the cookie string. Without passing the cookie from previous request, this site will not allow us to perform the search.

Note

An example related to basic authorization is located in the cookies.tcl file in the 07httprequest directory in the source code examples for this chapter.

HTTP and encryption

HTTP can handle both encrypted and unencrypted communication. The default is not to encrypt the connection, which is in fact http protocol when specifying URLs. It is also possible to use HTTP over SSL encrypted connection, which is usually called https.

The Tcl package http allows registering additional protocols to run HTTP on with the command http::register. It requires that we specify the name of the protocol, default port, and command that should be invoked to create a socket. This is mainly used for SSL connections. In order to enable the use of the https protocol, we need to add the following code to our application:

package require tls
http::register https 443 tls::socket

The tls package provides SSL-enabled sockets to the Tcl language and it provides the command tls::socket, which is an equivalent of the socket command, except for enabling SSL for connection. SSL and security is described in more detail in Chapter 12.

More information about the http package as well as remaining configuration options can be found in its documentation at: http://www.tcl.tk/man/tcl8.5/TclCmd/http.htm

Retrieving RSS information

Really Simple Sindication (RSS) is a format for publishing frequently updated information, such as blog entries, news headlines, audio, and video in a standard format. An RSS document (often also called a feed or channel) provides a list of items recently published along with metadata about these items. RSS is provided by an majority of content providers, such as portals, blog engines, and so on. Even Packt Publishing has its own RSS feed that we'll use later on in an example.

RSS itself is an XML document published over HTTP. This means that using the http and tdom packages, we can easily retrieve and parse an RSS feed and find out about recent documents. The RSS standard describes the structure of the XML document, which we'll learn later. All we need to know is the URL to RSS feed to start with. Information about the address to the RSS feed is usually stored in the website's metadata. This as well is standardized and usually looks like this:

<link rel="alternate" type="application/rss+xml" href="http:///rss.xml" title="Packt Publishing News" >

The previous example is from Packt Publishing's website. Your browser also probably supports this and a small icon on the bottom or near address of the page indicates that an RSS feed is present—clicking on it will go to the RSS feed and allow you to subscribe to it from your browser and get the address of the actual RSS feed.

Packt Publishing website's RSS feed address is http://www.packtpub.com/rss.xml. Tcler's Wiki is available at http://wiki.tcl.tk/, and it also has its feed available at: http://wiki.tcl.tk/rss.xml

We'll start with Tcler's Wiki and its feeds. The feed looks as follows:

<?xml version='1.0'?>
<rss version='0.91'>
<channel>
<title>The Tcler's Wiki - Recent Changes</title>
<link>http://wiki.tcl.tk/</link>
<description>Recent changes to The Tcler's Wiki</description>
<item>
<title>tDOM</title>
<link>http://wiki.tcl.tk/1948</link>
<pubDate>Wed, 14 Apr 2010 01:05:27 GMT</pubDate>
<description>Modified by CMcC (898 characters)
(actual description of the Wiki change goes here)

</description>
</item>
<item>
<title>WISH User Help</title>
<link>http://wiki.tcl.tk/20914</link>
<pubDate>Wed, 14 Apr 2010 00:59:10 GMT</pubDate>
<description>Modified by pa_mcclamrock (194 characters)
(actual description of the Wiki change goes here)

</description>
</item>
</channel>
</rss>

In order to read RSS, we need to find the<rss> tag and iterate over all<channel> tags. The first one includes information about the RSS feed and each<channel> instance can describe a different channel. It is possible that one RSS feed describes multiple channels, although usually an RSS feed covers only one channel. Each channel has a title, link, and list of items.

In order to get all items in a channel, we need to iterate over the<item> tags inside the channel. Each item describes a single element in a feed, such as one entry on a blog, in this case, one change in the wiki. Each item has a title, link, publication date, and description. Many RSS feeds provide additional information, which can be checked and handled properly if needed.

We can retrieve the RSS by simply doing:

set token [http::geturl "http://wiki.tcl.tk/rss.xml"]
if {[http::status $token] != "ok"} {
puts "Error retrieving RSS file"
exit 1
}
set data [http::data $token]
http::cleanup $token

We now have the RSS document in the data variable, and we can parse it using tdom :

set dom [dom parse $data]

The tdom package is described in more detail in Chapter 5.

Now we can iterate over each channel by doing:

foreach channel [$dom selectNodes "rss/channel"] {

This will use the selectNodes method to find all channel tags. We can then find the<title> tag in our channel and use asText method for that node to get title of current channel and print it:

set nodes [$channel selectNodes "title"]
set title [[lindex $nodes 0] asText]
puts "Channel "$title":"

We can now iterate over all items for a channel in similar way:

foreach item [$channel selectNodes "item"] {
set nodes [$item selectNodes "link"]
set link [[lindex $nodes 0] asText]
set nodes [$item selectNodes "title"]
set title [[lindex $nodes 0] asText]
puts "- [$link] $title"
}

We first use the selectNodes method to find<item> tags, iterate over them, get the link and title by finding proper nodes and using the asText method. We then print information on each element.

Finally we need to close the loop iterating over channels:

}

Note

The source code in this section is located in the rss-basic.tcl file in the 08rss directory in the source code examples for this chapter.

In many cases our applications will need to check and retrieve RSS periodically. In such cases, it is a good idea to cache the RSS on disk or in memory. If our application offers a web interface to consolidate multiple RSS channels or filter them to only include specified items, this would be the best approach.

In order to do this, all we need to do is change how our DOM tree is created. We'll start by setting URL of the feed and name of the file to store it:

set url "http://www.packtpub.com/rss.xml"
set filename "packtpub-rss.xml"

Next we can check if the local copy exists and if it was created in the last 30 minutes by doing:

if {(![file exists $filename]) ||
([file mtime $filename] < [clock scan "-30 minutes"])} {

This checks whether the file does not exist or if it has been created earlier than 30 minutes ago. If any of these conditions are met, then we download the RSS by doing:

set token [http::geturl $url -binary true]
if {[http::status $token] != "ok"} {
puts "Error retrieving RSS file"
exit 1
}
set fh [open $filename w]
fconfigure $fh -translation binary
puts $fh [http::data $token]
close $fh
http::cleanup $token
}

This is similar to previous example and to the HTTP examples shown earlier. The main difference is that we're downloading the file in binary mode. This prevents the http package from converting the file's encoding.

We will use the tDOM::xmlReadFile command to read the RSS. This command is part of the tdom package and handles encoding issues when reading files such as detecting encoding. It also handles the Byte Order Mark (BOM) markers that many RSS feeds have. This is a set of bytes at beginning of XML file that specifies encoding of the file and is described in more detail at: http://en.wikipedia.org/wiki/Byte_order_mark

In order to read and parse the file, all we need to do is:

set dom [dom parse [tDOM::xmlReadFile $filename]]

After that, we can use the same set of iterations as previously to list all entries in the RSS feed:

foreach channel [$dom selectNodes "rss/channel"] {
set nodes [$channel selectNodes "title"]
set title [[lindex $nodes 0] asText]
puts "Channel "$title":"
foreach item [$channel selectNodes "item"] {
set nodes [$item selectNodes "link"]
set link [[lindex $nodes 0] asText]
set nodes [$item selectNodes "title"]
set title [[lindex $nodes 0] asText]
puts "- [$link] $title"
}
}

Note

The source code in this section is located in the rss-file.tcl file in the 08rss directory in the source code examples for this chapter.

Using FTP

The File Transfer Protocol (FTP) is a stateful protocol for transferring files. It requires logging in, keeps the connection alive across transfers, and is not a lightweight protocol. It is mainly used for retrieving or transferring multiple files.

Tcl has a package called ftp, which is a part of Tcllib, and can be used to download and upload files over FTP. It offers functionality for connecting, getting file information, and uploading and downloading files.

Establishing connections

The command ftp::Open can be used to set up a connection to an FTP server. It accepts the server name, username, and password followed by any additional options we might want to provide. It returns a token that we can later use for all other operations.

Note

The package ftp differs from the majority of Tcl packages in that its commands start with uppercase, such as Open instead of open. This is not common in the Tcl world, but is the case for the FTP package for historical reasons.

Anonymous FTP connections require specifying anonymous as the username and the e-mail as the password. For example, in order to open an anonymous connection to ftp.tcl.tk, we can do:

set token [ftp::Open ftp.tcl.tk anonymous [email protected]]

An FTP session has a dedicated connection to the server. For each additional FTP transfer such as listing files, downloading and uploading, additional connections are made with the server for the purpose of each transfer. The FTP protocol uses two modes for communication—active and passive. Active connections work in such a way that FTP server connects to its client for sending data, passive connections work the opposite way, the client connects to FTP server.

While the default for ftp package is to use active mode, it might be necessary to use passive mode if our computer does not have a public IP address. Passive mode is also the default for majority of clients as it works regardless of having a public IP address, so it is a good idea to use passive mode whenever possible. Specifying the mode can be done using the -mode flag appended to the ftp::Open command. Acceptable values are active and passive. For example:

set token [ftp::Open ftp.tcl.tk 
anonymous [email protected] -mode passive]

Retrieving files

Another important aspect of FTP we should be aware of is transfer type. Due to how different operating systems store information, FTP differentiates between text (ASCII) and binary files. We can do this using the ftp::Type command. It accepts a token as the first argument and the transfer type as additional argument. It can be either ascii or binary. To set our transfer type to binary, we can do:

ftp::Type $token binary

We can now retrieve files over FTP. We can use the command ftp::Get for this purpose. It works in different modes depending on the arguments supplied. It first accepts the token of the connection, followed by the path to the remote file. We run this command without any arguments—in this case, the file will be downloaded with the same name as the remote path. If we specify a local filename as the next argument, it will be downloaded as that name. Instead, we can also specify -variable or -channel options, followed by a value. This will cause file data to be downloaded to a variable or saved in a specified channel. In case of a channel, it will not be closed after file is retrieved.

For example, we can retrieve remote file tcl8.5.7-src.tar.gz from pub/tcl/tcl8_5 remote directory as same file in local filesystem by doing:

ftp::Get "pub/tcl/tcl8_5/tcl8.5.7-src.tar.gz" 
"tcl8.5.7-src.tar.gz"

Similarly, we can download a file to variable by doing:

ftp::Get "pub/tcl/tcl8_5/tcl8.5.7-src.tar.gz" 
-variable fileContents

Please note that due to how this is implemented, the variable name is global and not local to the code invoking ftp::Get command. It is best to use namespace-based variables or object variables for this.

We can also resume an interrupted transfer by using the ftp::Reget command. It requires that we specify a token, a remote filename and, optionally, a local filename. If a local name is not specified, it is assumed to be the same as remote name. We can also specify offsets at which to begin and end download at, but by default, Tcl will download remaining part of the file.

For example, in order to complete transfer of tcl8.5.7-src.tar.gz file, we can simply invoke:

ftp::Reget $token tcl8.5.7-src.tar.gz

Uploading files

Similarly, there are commands for putting and appending to remote files. The command ftp::Put can be used to upload a file, while ftp::Append will append data to an already existing file, which can be used to continue an interrupted transfer. In both cases, the syntax is the same— the first argument is the token of the FTP session to use, followed by either the local filename, -data or -channel options. In the first case only a filename is needed, in the second option actual data or channel to use needs to be specified. The last argument is the remote filename to use. If the remote filename is missing, it is assumed to be the same as the local one.

For example, to upload a file, we can do:

ftp::Put my-logs.tar.gz

In order to append data to a file,we can do:

ftp::Append data "Some text
" remote-logs.txt

When downloading or uploading data, it will be treated as binary data—that is, if we are downloading text, we can use the encoding command to convert it from/to proper encoding.

Listing files and directories

FTP also introduces the concept of the current directory for a specified FTP session. We can change the directory by invoking the ftp::Cd command and retrieve the current directory by invoking ftp::Pwd. The first command expects the FTP session token and the path to the directory, which can be relative or absolute. The second command always returns an absolute path, which can be used when comparing and/or analyzing current location.

For example:

puts "Changing directory"
ftp::Cd $token "pub/tcl/tcl8_5"
puts "Changed to [ftp::Pwd $token]"

We can also retrieve information about remote files. The command ftp::FileSize returns size of a file in bytes. The command ftp::ModTime returns the time when a file was last modified, as Unix time. Both commands require a token to the FTP session and a filename. For example:

set size [ftp::FileSize $token tcl8.5.7-src.tar.gz]
puts "tcl8.5.7-src.tar.gz is $size bytes"
set mtime [ftp::ModTime $token tcl8.5.7-src.tar.gz]
set mtext [clock format $mtime]
puts "tcl8.5.7-src.tar.gz last modified on $mtext"

We can also list the contents of a directory. The command ftp::NList can be used to list all files and directories in the current or specified directory. It accepts a token to the session and we can also provide directory to list. If this is not specified, listing of current directory is performed. This command returns a list of all items found in a directory, each element of a list being the name of the file or directory.

For example:

foreach file [ftp::NList $token] {
puts $file
}

The command ftp::List returns a long listing of a directory. This returns a list of items, where each item is represented by a line, similar to output of ls -l command in Unix. For example:

foreach line [ftp::List $token] {
puts $line
}

The preceding code would print out the following line, among others:

-rw-r--r-- 1 ftp ftp 4421720 Apr 15 2009 tcl8.5.7-src.tar.gz

While this provides much more information, we need additional code to parse such lines. Let's start with creating a command for this:

proc parseListLine {line} {

First we try to search for filenames with spaces and remove symbolic link definitions (which are in the form of filename -> actual_file it points to).

if {[regexp {([^ ]|[^0-9] )+$} $line name]} {
# Check for links
if {[set idx [string first " -> " $name]] != -1} {
incr idx -1
set name [string range $name 0 $idx]
}
}

Following that we remove any multiple spaces and create a list of items by splitting the resulting string by spaces:

regsub -all "[ 	]+" $line " " line
set items [split $line " "]

If we did not match the name with previous attempt, we assume that filename is the last element:

if {![info exists name]} {set name [lindex $items end]}

We then try to get the permissions and file size information, if possible:

set perm [lindex $items 0]
if {[string is integer [lindex $items 4]]} {
set size [lindex $items 4]
} else {
lappend result ""
}

Based on the permissions we've extracted, we take the first character and gather the actual file type based on it:

switch -- [string index $perm 0] {
d {
set type "directory"
}
c - b {
set type "device"
}
l {
set type "symlink"
}
default {
set type "file"
}
}

We then return a list that consists of the filename, type, size, and permissions:

return [list $name $type $size $perm]
}

This code is based on ftp.tcl from the tclvfs package, which is licensed under the BSD license. The package is available at: http://sourceforge.net/projects/tclvfs/

We can then test it in the following way:

foreach line [ftp::List $token] {
puts "
Original line: $line"
lassign [parseListLine $line] 
name type size perm
puts "Filename '$name' ($type), size $size, $perm"
}

In addition to this, we can also modify remote filesystem contents. The command ftp::MkDir can be used to create a directory. It expects a token to the session as the first argument and the name of the directory to create as the second argument.

The command ftp::Rename can be used to rename a file or directory. It requires a token of the FTP session, and the old and new names.

The commands ftp::RmDir and ftp::Delete can be used to delete a directory or file, respectively. Both accept token of the FTP session and name of the directory or file to delete.

Closing a connection to an FTP server can be done using the ftp::Close command, specifying token of the FTP session. For example:

ftp::Close $token

Note

The source code in this section is located in the 09ftp directory in the source code examples for this chapter.

More information about the ftp package as well as the remaining configuration options can be found in its documentation in SourceForge project at: http://tcllib.sourceforge.net/doc/ftp.html

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.163.197