Transferring files over the Internet is one of the most common things that networked applications do. In this section, we'll focus on two very popular protocols—HyperText Transfer Protocol (HTTP) and File Transfer Protocol (FTP). The first one is used to download websites, images, or other files from the Web and second one is used to download or upload files. There are subtle differences between the two and we'll need to go into a bit more detail to understand them.
HTTP is a for retrieving a single item over network. It allows us to retrieve a single element and is a lightweight protocol. It also allows us to handle large numbers of requests and is the protocol used for serving websites to web browsers. HTTP is used for both static files (such as HTML pages, images, downloads, and so on) and dynamic files (such as PHP or Tcl scripts building pages when users access them).
FTP protocol on the other hand is designed to transfer files over a network. It features authentication, and offers a lot of features that are specific to file management—creating directories, renaming and deleting items, listing contents of a directory, and the idea of a working directory. FTP is also a more heavyweight protocol and is less commonly used for offering downloads to wide audience.
Specifying locations of a resource on the Internet is done by specifying a URL—Unique Resource Location. It consists of a protocol, an optional username, a password, a hostname, and a port followed by the path to the resource. For example, a URL could look like http://wiki.tcl.tk/tcllib
Tcl offers the uri
package, which is part of Tcllib, and can be used to split and join URLs from parts. These parts include scheme, user, pwd, host, port, path, query
, and fragment
. Not all of these are always present in all types of URLs. The first one is the only key only present and defines protocol that is used—for example, http
and ftp
. Credentials are optional and are specified as user
and pwd
. The parts host
and port
specify the hostname and port to connect to; port
can be empty, which means it is the default port for specified protocol. The location of the resource is specified as path
, and query
is an optional part that defines a query sent via the URL (mainly for http
requests); fragment
points to a fragment of a page and is also used only for the HTTP protocol.
Currently, ftp, http, https, file, mailto
, and news protocols are supported.
We can split a URL into elements using the uri::split
command. It returns one or more name-value pairs which represent each part. For example, we can do the following:
set uridata [uri::split "http://wiki.tcl.tk/tcllib"] foreach name {scheme user pwd host port path query fragment} { if {[dict exists $uridata $name]} { puts "$name = [dict get $uridata $name]" } }
This will print the following result:
scheme = http user = pwd = host = wiki.tcl.tk port = path = tcllib query =
We can also create a URL by specifying various parts using the uri::join
command. It takes all arguments and parses them as name-value pairs, specifying parts of the address it should generate. For example:
puts [uri::join scheme http host www.packtpub.com port 80 path books ]
The preceding code will print out the address http://www.packtpub.com/books. Please note that port part was skipped, because 80 is the default port for the HTTP protocol.
We can also use the result from splitting and join it back by running:
puts [uri::join {*}[uri::split "http://www.google.com"]]
This will split the address into parts and pass that to the uri::join
command—{*} will cause all elements of the list to be appended as separate arguments, because the command expects it.
The sample code shown in this section is located in the 06uri
directory in the source code examples for this chapter.
More information about the uri
package can be found in its documentation available at: http://tcllib.sourceforge.net/doc/uri.html
HTTP is a stateless protocol that uses a simple request-response message exchange pattern. This means that whenever a client, such as our application, wants to access a particular resource, it sends a HTTP request. The server then processes it and sends back a response, usually being the requested response or information that it could not be found or an error occurred.
HTTP works by sending a request to the server. A request describes whether we are getting information or sending data to server, the path to the resource, and the version of the protocol we're using. A request also consists of several headers, which are name-value pairs and can either be standard or custom ones. Finally, a request can also provide data that we are uploading to server.
After receiving and parsing the request, the web server returns the response. The response consists of a status line, one or more headers, and the body of the response. After the response is sent, the current connection is either closed or reused for the next request. However, from the HTTP perspective, each of these requests is treated independently.
Tcl comes with a http
package built-in. This package offers a basic, but complete HTTP client which can be used to perform both basic and more advanced operations. The command http::geturl
is used to initiate or perform a request and is the starting point for performing HTTP operations. This command accepts a URL followed by one or more options.
A commonly used option is -binary
, which specifies whether the transfer should be done in binary mode and defaults to false
. By default, Tcl does newlines and encoding conversions for text documents—therefore, if a server sends out some HTML in UTF-8, Tcl converts that to proper string. If the -binary
option is enabled, this is not performed and all types of documents are retrieved as bytes, not depending on whether it is a text document or not.
The http::geturl
command always returns a token that can be used to get information and data related to this query. For example, in order to get the contents of Google's main page, we can simply run:
package require http set token [http::geturl "http://www.google.com/"] puts [http::data $token]
The second line receives the request and returns token that we can use later on. The command http::data
returns body of server's response, which we then print to standard output.
The command http::cleanup
should be used after we are done working with a request. It will clean up all resources used for this request. For example:
http::cleanup $token
We can also save contents of the response directly to any open channel. For example:
set fh [open "google-index.html" w] set token [http::geturl "http://www.google.com/" -channel $fh] http::cleanup $token close $fh
This will cause a response to be written to the specified channel and not stored in memory. This can be used for downloading large files and/or if you plan to save the contents of the response to a file.
There are important things that our previous example is missing, for example, checking for errors. Although the http
package will throw an error if the web server is unreachable, there are cases when a web server will send a response stating that a resource is unavailable or an error has occurred. In these cases, it is not translated into an error as this might be the desired response from our application's perspective.
We can check the status of handling the request by using the http::status
command. It will return one of the following values:
Status |
Description |
---|---|
|
Indicates that the HTTP request was completed successfully |
|
Indicates that the server closed connection without replying |
|
Indicates an error |
If the status is error
, we can also retrieve actual error message using the http::error
command.
HTTP server sends status codes which specify the outcome of processing this request. We can retrieve the status code using the http::ncode
command. Usually, it is sufficient to check if the code equals 200
, which means that the request has been processed correctly.
The most frequently used status codes are:
Code |
Description |
---|---|
200 |
The request has been successfully executed |
206 |
The request has been successfully executed and result is partial content, used to download parts of a file over HTTP |
301 |
Moved permanently—indicates that resource has been permanently moved to a new location; response header |
302 |
Moved temporarily—indicates that resource has been temporarily moved to a new location; response header |
401 |
Unauthorized—indicates that a server has requested the client to authenticate |
403 |
Forbidden—this indicates that access to this resource is forbidden |
404 |
Not found—indicates that a specified resource cannot be found |
500 |
Internal error—indicates that there was a problem serving this request; for example, because HTTP server configuration is broken or module/script failed |
For example, we can print out the status of our request by running:
switch -- [http::status $token] { error { puts "ERROR: [http::error $token]" } eof { puts "EOF reading response" } ok { puts "OK; code: [http::ncode $token]" puts "Data:" puts [http::data $token] } }
We can also get all headers from an HTTP response by using the http::meta
command. It returns a list of name-value pairs that can be used as dictionaries or arrays. For example, to get contents of the Location
header, we can do the following:
set code [http::ncode $token] if {($code == 301) || ($code == 302)} { set newURL [dict get [http::meta $token] Location] # go to new location }
We can also use the http
package to submit information to a web server—for automating things such as filling in forms. Data from a form can be formatted using the http::formatQuery
command. It can now be sent in two ways—either as part of the path in the URL or sending data as separate data. The first case is done using a GET request and an example is searching using Google, like http://www.google.com/search?q=tcl—the query is passed after ?
character. The other approach is sending a POST request and the data is sent after the actual request.
POST is used for sending a larger amount of data and usually takes place when the request is modifying/sending data. GET is usually used for reading information as it can send smaller amount of data. POST requests are used for sending data, they can send much larger amount of data and POST requests are not cached by proxy servers.
For both GET and POST, data is sent as name-value pairs—q=tcl means value for field q
is tcl
. Multiple values are separated using&
character. Tcl offers a command for generating such data, that is http::formatQuery
. It accepts zero or more name-value pairs as arguments and formats proper query as an output.
Sending data using GET requires that we append the query to actual URL, for example:
set query [http::formatQuery q "Tcl Programming"] set url "http://www.google.com/search?$query"
Sending POST data requires passing the data as a -query
option to http::geturl
. For example, we can do the following:
set query [http::formatQuery search "tcl programming"] set url "http://www.packtpub.com/search" set token [http::geturl $url -query $query]
This will cause a query to be sent as POST and data from -query
will be sent.
By default, data is sent as encoded form data, it is also possible to send different query data. Usually, this is accompanied by sending the appropriate query type to the server. We can do this by adding the-type flag when sending the query. If a type is not specified, it defaults to application/x-www-form-urlencoded
, which is the default MIME type for encoded form data. Many applications expecting XML or JavaScript Object Notaktion (JSON) data require that data sent in XML/JSON is sent with the appropriate MIME type in the headers of the request.
For example, we can send XML with accompanying data by doing:
http::geturl $url command onCompleteXMLPost -type "text/xml" -query [$dom asXML]
This will cause the appropriate value for the Content-Type
header to be sent in the query. Details on XML and its handling can be found in Chapter 5 and show how to read and write documents.
Examples related to basic HTTP functionality are placed in basic.tcl
file in the 07httprequest
directory in the source code examples for this chapter.
By default, http
queries are done in a synchronous way, meaning that the http::geturl
command returns after the command has been executed. In many cases, it is better to use an asynchronous approach, where the command exits instantly, uses events to process requests, and issues our callback, which is a Tcl command that will be run when the operation is completed.
The http
package also offers advanced features such as passing additional headers to requests. This can be done by providing both the -timeout
and -command
options to the http::geturl
command. In this case, the command returns immediately and returns a token that will be used. Accessing the data should be done from the command passed to the -command
option. In asynchronous requests, http::geturl
might still throw an error, for cases such as "no existing hostname". It is still recommended to catch such exceptions and handle them appropriately.
For example, in order to download the Google page asynchronously, we can do the following:
if {[catch { set token [http::geturl "http://www.google.com/" -timeout 300000 -command doneGet] } error]} { puts stderr "Error while getting URL: $error" }
Next we can create the command that will be invoked as the callback. It will be run with an additional parameter—the token of the request. For example, our command, based on previous examples, can be run as follows:
proc doneGet {token} { switch -- [http::status $token] { error { puts "ERROR: [http::error $token]" } eof { puts "EOF reading response" } ok { puts "OK; code: [http::ncode $token]" puts " Size: [http::size $token]" puts " Data:" puts [http::data $token] } } http::cleanup $token }
The token can be used in the same way as we used it with synchronous requests. We are also responsible for cleaning up the token, which is done in the last line of the example.
The package http
can also be used for more advanced features such as partial content downloading, sending cookies, and HTTP-level authorization.
The majority of these functions can be carried out using the -headers
option passed to the http::geturl
command. This option accepts a list of one or more name-value pairs. These can be any headers and values, but these should be headers that the server can understand. For example, we can use it to send cookie values to a site or authorize over HTTP for sites that use it.
There are two common ways that users are authorized within the Web—at the HTTP level and using HTML forms and cookies. The first one provides the username and password information as a HTTP header. The latter uses sending form data and cookies to track users, and is mainly related to handling cookies properly at the HTTP level.
For now, we'll focus on HTTP level authorization. A lot of web-based applications and data are protected using this mechanism. Let's assume we want to retrieve data from a specified URL. We need to connect to it without providing any credentials, and at this point, the server should include HTTP status 401
. The following code would be a good start for checking if authorization is needed:
set token [http::geturl $url] if {[http::status $token] != "ok"} { puts stderr "Error while retrieving URL" http::cleanup $token exit 1 } if {[http::ncode $token] == 401} {
If this condition is true
, we should resend our request. The server will provide the WWW-Authenticate
header in the response that will indicate the type of authentication and realm, which specifies the descriptive name of the resource we are currently trying to authenticate to. We can print it out by running:
set realm [dict get [http::meta $token] "WWW-Authenticate"] puts "Authenticate information: $realm"
Next we need to clean up the previous request and send a new one with proper authentication information. Except for a few cases, the authentication type of Basic
is used by HTTP servers. It requires sending a<username>:<password>
string encoded as base64
, preceded by the word Basic
. We'll use the package base64
for this along with the base64::encode
command:
package require base64 set authinfo [base64::encode ${username}:${password}] set headers [list Authorization "Basic $authinfo"]
The second line contains the Authorization
header to be sent to the server, along with the credentials as base64
. Next we're sending a new request by doing:
set token [http::geturl $url -headers $headers] if {[http::status $token] != "ok"} { http::cleanup $token puts stderr "Error while retrieving URL" exit 1 }
We can then check if our current username and password were correct. If not, then the status for a new request will also be 401:
if {[http::ncode $token] == 401} { puts stderr "Invalid username and/or password" http::cleanup $token exit 1 }
An additional feature that headers are useful for is supporting cookies. While the http
package itself does not provide this functionality, it is easy to support this in the majority of cases. Standards for setting and getting cookies define expiration dates, and paths and domains that cookies should be valid for. However, in the majority of code that we write, it is enough to assume that the cookie you're getting is needed for all subsequent requests.
Cookies work in such a way that HTTP responses from servers may include one or more Set-Cookie
headers. These headers need to be parsed and all cookies should be passed in the Cookie
header. The server might send a response similar to this one:
Set-Cookie: mycookie=TEST0123; path=/ Set-Cookie: i=1; expires=Thu, 27-Oct-2011 11:07:24 GMT; path=/
This causes the cookie mycookie
to be set to TEST0123
and i
to be set to 1
. Each subsequent request to this server should include the following header:
Cookie: mycookie=TEST0123; i=1
All changes to existing cookies overwrite them and new cookies cause a new value to be set, which is similar to behavior of arrays and dictionaries in Tcl. Writing code that handles cookies without taking parameters into account is relatively easy.
Let's start by writing a command that processes the HTTP response for cookies. We define the namespace for our code, the reference variable specified by user, and iterate over HTTP headers from the provided token:
namespace eval cookies {} proc cookies::processCookies {varname token} { upvar 1 $varname d foreach {name value} [http::meta $token] { if {[string equal -nocase $name "Set-Cookie"]} {
If the header is Set-Cookie
, we process its value by taking part only until the first occurrence of a semi-colon and separating it into name and value using regular expressions:
set value [lindex [split $value ";"] 0] if {[regexp "^(.*?)=(.*)$" $value - cname cvalue]} { dict set d $cname $cvalue } } } }
This will cause the dictionary that is stored in the varname
variable to be updated. Next, in all requests ,we need to pass all cookies. A small function to generate the appropriate value for the Cookies
header would look like:
proc cookies::prepareCookies {var} { set rc [list] dict for {name value} $var { lappend rc "$name=$value" } return [join $rc "; "] }
Here we only take each cookie, append it, and join all cookies using a semi-colon followed by a space. In order to use this to query Tcler's wiki we can do the following:
set c [dict create] set h [http::geturl http://wiki.tcl.tk/] cookies::processCookies c $h http::cleanup $h set query [http::formatQuery _charset_ utf-8 S cookie] set h [http::geturl http://wiki.tcl.tk/_/search?$query -headers [list Cookie [cookies::prepareCookies $c]] ]
The first request gets the main page of the Wiki, which causes a cookie to be set. We need to pass this cookie to the second request in order to be able to perform a search. In this case, we're searching for the cookie
string. Without passing the cookie from previous request, this site will not allow us to perform the search.
HTTP can handle both encrypted and unencrypted communication. The default is not to encrypt the connection, which is in fact http
protocol when specifying URLs. It is also possible to use HTTP over SSL encrypted connection, which is usually called https
.
The Tcl package http
allows registering additional protocols to run HTTP on with the command http::register
. It requires that we specify the name of the protocol, default port, and command that should be invoked to create a socket. This is mainly used for SSL connections. In order to enable the use of the https
protocol, we need to add the following code to our application:
package require tls http::register https 443 tls::socket
The tls
package provides SSL-enabled sockets to the Tcl language and it provides the command tls::socket
, which is an equivalent of the socket
command, except for enabling SSL for connection. SSL and security is described in more detail in Chapter 12.
More information about the http
package as well as remaining configuration options can be found in its documentation at: http://www.tcl.tk/man/tcl8.5/TclCmd/http.htm
Really Simple Sindication (RSS) is a format for publishing frequently updated information, such as blog entries, news headlines, audio, and video in a standard format. An RSS document (often also called a feed or channel) provides a list of items recently published along with metadata about these items. RSS is provided by an majority of content providers, such as portals, blog engines, and so on. Even Packt Publishing has its own RSS feed that we'll use later on in an example.
RSS itself is an XML document published over HTTP. This means that using the http
and tdom
packages, we can easily retrieve and parse an RSS feed and find out about recent documents. The RSS standard describes the structure of the XML document, which we'll learn later. All we need to know is the URL to RSS feed to start with. Information about the address to the RSS feed is usually stored in the website's metadata. This as well is standardized and usually looks like this:
<link rel="alternate" type="application/rss+xml" href="http:///rss.xml" title="Packt Publishing News" >
The previous example is from Packt Publishing's website. Your browser also probably supports this and a small icon on the bottom or near address of the page indicates that an RSS feed is present—clicking on it will go to the RSS feed and allow you to subscribe to it from your browser and get the address of the actual RSS feed.
Packt Publishing website's RSS feed address is http://www.packtpub.com/rss.xml. Tcler's Wiki is available at http://wiki.tcl.tk/, and it also has its feed available at: http://wiki.tcl.tk/rss.xml
We'll start with Tcler's Wiki and its feeds. The feed looks as follows:
<?xml version='1.0'?> <rss version='0.91'> <channel> <title>The Tcler's Wiki - Recent Changes</title> <link>http://wiki.tcl.tk/</link> <description>Recent changes to The Tcler's Wiki</description> <item> <title>tDOM</title> <link>http://wiki.tcl.tk/1948</link> <pubDate>Wed, 14 Apr 2010 01:05:27 GMT</pubDate> <description>Modified by CMcC (898 characters) (actual description of the Wiki change goes here) </description> </item> <item> <title>WISH User Help</title> <link>http://wiki.tcl.tk/20914</link> <pubDate>Wed, 14 Apr 2010 00:59:10 GMT</pubDate> <description>Modified by pa_mcclamrock (194 characters) (actual description of the Wiki change goes here) </description> </item> </channel> </rss>
In order to read RSS, we need to find the<rss>
tag and iterate over all<channel>
tags. The first one includes information about the RSS feed and each<channel>
instance can describe a different channel. It is possible that one RSS feed describes multiple channels, although usually an RSS feed covers only one channel. Each channel has a title, link, and list of items.
In order to get all items in a channel, we need to iterate over the<item>
tags inside the channel. Each item describes a single element in a feed, such as one entry on a blog, in this case, one change in the wiki. Each item has a title, link, publication date, and description. Many RSS feeds provide additional information, which can be checked and handled properly if needed.
We can retrieve the RSS by simply doing:
set token [http::geturl "http://wiki.tcl.tk/rss.xml"] if {[http::status $token] != "ok"} { puts "Error retrieving RSS file" exit 1 } set data [http::data $token] http::cleanup $token
We now have the RSS document in the data
variable, and we can parse it using tdom
:
set dom [dom parse $data]
The tdom
package is described in more detail in Chapter 5.
Now we can iterate over each channel by doing:
foreach channel [$dom selectNodes "rss/channel"] {
This will use the selectNodes
method to find all channel tags. We can then find the<title>
tag in our channel and use asText
method for that node to get title of current channel and print it:
set nodes [$channel selectNodes "title"] set title [[lindex $nodes 0] asText] puts "Channel "$title":"
We can now iterate over all items for a channel in similar way:
foreach item [$channel selectNodes "item"] { set nodes [$item selectNodes "link"] set link [[lindex $nodes 0] asText] set nodes [$item selectNodes "title"] set title [[lindex $nodes 0] asText] puts "- [$link] $title" }
We first use the selectNodes
method to find<item>
tags, iterate over them, get the link and title by finding proper nodes and using the asText
method. We then print information on each element.
Finally we need to close the loop iterating over channels:
}
The source code in this section is located in the rss-basic.tcl
file in the 08rss
directory in the source code examples for this chapter.
In many cases our applications will need to check and retrieve RSS periodically. In such cases, it is a good idea to cache the RSS on disk or in memory. If our application offers a web interface to consolidate multiple RSS channels or filter them to only include specified items, this would be the best approach.
In order to do this, all we need to do is change how our DOM tree is created. We'll start by setting URL of the feed and name of the file to store it:
set url "http://www.packtpub.com/rss.xml" set filename "packtpub-rss.xml"
Next we can check if the local copy exists and if it was created in the last 30 minutes by doing:
if {(![file exists $filename]) || ([file mtime $filename] < [clock scan "-30 minutes"])} {
This checks whether the file does not exist or if it has been created earlier than 30 minutes ago. If any of these conditions are met, then we download the RSS by doing:
set token [http::geturl $url -binary true] if {[http::status $token] != "ok"} { puts "Error retrieving RSS file" exit 1 } set fh [open $filename w] fconfigure $fh -translation binary puts $fh [http::data $token] close $fh http::cleanup $token }
This is similar to previous example and to the HTTP examples shown earlier. The main difference is that we're downloading the file in binary mode. This prevents the http
package from converting the file's encoding.
We will use the tDOM::xmlReadFile
command to read the RSS. This command is part of the tdom
package and handles encoding issues when reading files such as detecting encoding. It also handles the Byte Order Mark (BOM) markers that many RSS feeds have. This is a set of bytes at beginning of XML file that specifies encoding of the file and is described in more detail at: http://en.wikipedia.org/wiki/Byte_order_mark
In order to read and parse the file, all we need to do is:
set dom [dom parse [tDOM::xmlReadFile $filename]]
After that, we can use the same set of iterations as previously to list all entries in the RSS feed:
foreach channel [$dom selectNodes "rss/channel"] { set nodes [$channel selectNodes "title"] set title [[lindex $nodes 0] asText] puts "Channel "$title":" foreach item [$channel selectNodes "item"] { set nodes [$item selectNodes "link"] set link [[lindex $nodes 0] asText] set nodes [$item selectNodes "title"] set title [[lindex $nodes 0] asText] puts "- [$link] $title" } }
The File Transfer Protocol (FTP) is a stateful protocol for transferring files. It requires logging in, keeps the connection alive across transfers, and is not a lightweight protocol. It is mainly used for retrieving or transferring multiple files.
Tcl has a package called ftp
, which is a part of Tcllib, and can be used to download and upload files over FTP. It offers functionality for connecting, getting file information, and uploading and downloading files.
The command ftp::Open
can be used to set up a connection to an FTP server. It accepts the server name, username, and password followed by any additional options we might want to provide. It returns a token that we can later use for all other operations.
The package ftp
differs from the majority of Tcl packages in that its commands start with uppercase, such as Open
instead of open
. This is not common in the Tcl world, but is the case for the FTP package for historical reasons.
Anonymous FTP connections require specifying anonymous
as the username and the e-mail as the password. For example, in order to open an anonymous connection to ftp.tcl.tk
, we can do:
set token [ftp::Open ftp.tcl.tk anonymous [email protected]]
An FTP session has a dedicated connection to the server. For each additional FTP transfer such as listing files, downloading and uploading, additional connections are made with the server for the purpose of each transfer. The FTP protocol uses two modes for communication—active and passive. Active connections work in such a way that FTP server connects to its client for sending data, passive connections work the opposite way, the client connects to FTP server.
While the default for ftp
package is to use active mode, it might be necessary to use passive mode if our computer does not have a public IP address. Passive mode is also the default for majority of clients as it works regardless of having a public IP address, so it is a good idea to use passive mode whenever possible. Specifying the mode can be done using the -mode
flag appended to the ftp::Open
command. Acceptable values are active
and passive
. For example:
set token [ftp::Open ftp.tcl.tk anonymous [email protected] -mode passive]
Another important aspect of FTP we should be aware of is transfer type. Due to how different operating systems store information, FTP differentiates between text (ASCII) and binary files. We can do this using the ftp::Type
command. It accepts a token as the first argument and the transfer type as additional argument. It can be either ascii
or binary
. To set our transfer type to binary, we can do:
ftp::Type $token binary
We can now retrieve files over FTP. We can use the command ftp::Get
for this purpose. It works in different modes depending on the arguments supplied. It first accepts the token of the connection, followed by the path to the remote file. We run this command without any arguments—in this case, the file will be downloaded with the same name as the remote path. If we specify a local filename as the next argument, it will be downloaded as that name. Instead, we can also specify -variable
or -channel
options, followed by a value. This will cause file data to be downloaded to a variable or saved in a specified channel. In case of a channel, it will not be closed after file is retrieved.
For example, we can retrieve remote file tcl8.5.7-src.tar.gz
from pub/tcl/tcl8_5
remote directory as same file in local filesystem by doing:
ftp::Get "pub/tcl/tcl8_5/tcl8.5.7-src.tar.gz" "tcl8.5.7-src.tar.gz"
Similarly, we can download a file to variable by doing:
ftp::Get "pub/tcl/tcl8_5/tcl8.5.7-src.tar.gz" -variable fileContents
Please note that due to how this is implemented, the variable name is global and not local to the code invoking ftp::Get
command. It is best to use namespace-based variables or object variables for this.
We can also resume an interrupted transfer by using the ftp::Reget
command. It requires that we specify a token, a remote filename and, optionally, a local filename. If a local name is not specified, it is assumed to be the same as remote name. We can also specify offsets at which to begin and end download at, but by default, Tcl will download remaining part of the file.
For example, in order to complete transfer of tcl8.5.7-src.tar.gz
file, we can simply invoke:
ftp::Reget $token tcl8.5.7-src.tar.gz
Similarly, there are commands for putting and appending to remote files. The command ftp::Put
can be used to upload a file, while ftp::Append
will append data to an already existing file, which can be used to continue an interrupted transfer. In both cases, the syntax is the same— the first argument is the token of the FTP session to use, followed by either the local filename, -data
or -channel
options. In the first case only a filename is needed, in the second option actual data or channel to use needs to be specified. The last argument is the remote filename to use. If the remote filename is missing, it is assumed to be the same as the local one.
For example, to upload a file, we can do:
ftp::Put my-logs.tar.gz
In order to append data to a file,we can do:
ftp::Append data "Some text " remote-logs.txt
When downloading or uploading data, it will be treated as binary data—that is, if we are downloading text, we can use the encoding
command to convert it from/to proper encoding.
FTP also introduces the concept of the current directory for a specified FTP session. We can change the directory by invoking the ftp::Cd
command and retrieve the current directory by invoking ftp::Pwd
. The first command expects the FTP session token and the path to the directory, which can be relative or absolute. The second command always returns an absolute path, which can be used when comparing and/or analyzing current location.
For example:
puts "Changing directory" ftp::Cd $token "pub/tcl/tcl8_5" puts "Changed to [ftp::Pwd $token]"
We can also retrieve information about remote files. The command ftp::FileSize
returns size of a file in bytes. The command ftp::ModTime
returns the time when a file was last modified, as Unix time. Both commands require a token to the FTP session and a filename. For example:
set size [ftp::FileSize $token tcl8.5.7-src.tar.gz] puts "tcl8.5.7-src.tar.gz is $size bytes" set mtime [ftp::ModTime $token tcl8.5.7-src.tar.gz] set mtext [clock format $mtime] puts "tcl8.5.7-src.tar.gz last modified on $mtext"
We can also list the contents of a directory. The command ftp::NList
can be used to list all files and directories in the current or specified directory. It accepts a token to the session and we can also provide directory to list. If this is not specified, listing of current directory is performed. This command returns a list of all items found in a directory, each element of a list being the name of the file or directory.
For example:
foreach file [ftp::NList $token] { puts $file }
The command ftp::List
returns a long listing of a directory. This returns a list of items, where each item is represented by a line, similar to output of ls -l
command in Unix. For example:
foreach line [ftp::List $token] { puts $line }
The preceding code would print out the following line, among others:
-rw-r--r-- 1 ftp ftp 4421720 Apr 15 2009 tcl8.5.7-src.tar.gz
While this provides much more information, we need additional code to parse such lines. Let's start with creating a command for this:
proc parseListLine {line} {
First we try to search for filenames with spaces and remove symbolic link definitions (which are in the form of filename -> actual_file
it points to).
if {[regexp {([^ ]|[^0-9] )+$} $line name]} { # Check for links if {[set idx [string first " -> " $name]] != -1} { incr idx -1 set name [string range $name 0 $idx] } }
Following that we remove any multiple spaces and create a list of items by splitting the resulting string by spaces:
regsub -all "[ ]+" $line " " line set items [split $line " "]
If we did not match the name with previous attempt, we assume that filename is the last element:
if {![info exists name]} {set name [lindex $items end]}
We then try to get the permissions and file size information, if possible:
set perm [lindex $items 0] if {[string is integer [lindex $items 4]]} { set size [lindex $items 4] } else { lappend result "" }
Based on the permissions we've extracted, we take the first character and gather the actual file type based on it:
switch -- [string index $perm 0] { d { set type "directory" } c - b { set type "device" } l { set type "symlink" } default { set type "file" } }
We then return a list that consists of the filename, type, size, and permissions:
return [list $name $type $size $perm] }
This code is based on ftp.tcl
from the tclvfs
package, which is licensed under the BSD license. The package is available at: http://sourceforge.net/projects/tclvfs/
We can then test it in the following way:
foreach line [ftp::List $token] { puts " Original line: $line" lassign [parseListLine $line] name type size perm puts "Filename '$name' ($type), size $size, $perm" }
In addition to this, we can also modify remote filesystem contents. The command ftp::MkDir
can be used to create a directory. It expects a token to the session as the first argument and the name of the directory to create as the second argument.
The command ftp::Rename
can be used to rename a file or directory. It requires a token of the FTP session, and the old and new names.
The commands ftp::RmDir
and ftp::Delete
can be used to delete a directory or file, respectively. Both accept token of the FTP session and name of the directory or file to delete.
Closing a connection to an FTP server can be done using the ftp::Close
command, specifying token of the FTP session. For example:
ftp::Close $token
The source code in this section is located in the 09ftp
directory in the source code examples for this chapter.
More information about the ftp
package as well as the remaining configuration options can be found in its documentation in SourceForge project at: http://tcllib.sourceforge.net/doc/ftp.html
3.137.163.197