Simply put, CGI scripts implement much of the interaction you typically experience on the Web. They are a standard and widely used mechanism for programming web site interaction. There are other ways to add interactive behavior to web sites with Python, including client-side solutions (e.g., JPython applets and Active Scripting), as well as server-side technologies, which build upon the basic CGI model (e.g., Active Server Pages and Zope), and we will discuss these briefly at the end of Chapter 15, too. But by and large, CGI server-side scripts are used to program much of the activity on the Web.
Formally speaking, CGI scripts are programs that run on a server machine and adhere to the Common Gateway Interface -- a model for browser/server communications, from which CGI scripts take their name. Perhaps a more useful way to understand CGI, though, is in terms of the interaction it implies.
Most people take this interaction for granted when browsing the Web and pressing buttons in web pages, but there is a lot going on behind the scenes of every transaction on the Web. From the perspective of a user, it’s a fairly familiar and simple process:
Submission. When you visit a web site to purchase a product or submit information online, you generally fill in a form in your web browser, press a button to submit your information, and begin waiting for a reply.
Response. Assuming all is well with both your Internet connection and the computer you are contacting, you eventually get a reply in the form of a new web page. It may be a simple acknowledgement (e.g, “Thanks for your order”) or a new form that must be filled out and submitted again.
And, believe it or not, that simple model is what makes most of the Web hum. But internally, it’s a bit more complex. In fact, there is a subtle client/server socket-based architecture at work -- your web browser running on your computer is the client, and the computer you contact over the Web is the server. Let’s examine the interaction scenario again, with all the gory details that users usually never see.
When you fill out a form page in a web browser and press a submission button, behind the scenes your web browser sends your information across the Internet to the server machine specified as its receiver. The server machine is usually a remote computer that lives somewhere else in both cyberspace and reality. It is named in the URL you access (the Internet address string that appears at the top of your browser). The target server and file can be named in a URL you type explicitly, but more typically they are specified in the HTML that defines the submission page itself -- either in a hyperlink, or in the “action” tag of a form’s HTML. However the server is specified, the browser running on your computer ultimately sends your information to the server as bytes over a socket, using techniques we saw in the last two chapters. On the server machine, a program called an HTTP server runs perpetually, listening on a socket for incoming data from browsers, usually on port number 80.
When
your information shows up at the server machine, the HTTP server
program notices it first and decides how to handle the request. If
the requested URL names a simple web page (e.g.,
a URL ending in .html), the HTTP
server opens the named HTML file on the server machine and sends its
text back to the browser over a socket. On the client, the browser
reads the HTML and uses it to construct the next page you see. But if
the URL requested by the browser names an executable program instead (e.g., a URL ending in .cgi), the HTTP server starts the named
program on the server machine to process the request and redirects
the incoming browser data to the spawned program’s
stdin
input stream and environment variables. That
program is usually a CGI script -- a program run on the remote
server machine somewhere in cyberspace, not on your computer. The CGI
script is responsible for handling the request from this point on; it
may store your information in a database, charge your credit card,
and so on.
Ultimately, the CGI script prints HTML to
generate a new response page in your browser. When a CGI script is
started, the HTTP server takes care to connect the script’s
stdout
standard output stream to a socket that the
browser is listening to. Because of this, HTML code printed by the
CGI script is sent over the Internet, back to your browser, to
produce a new page. The HTML printed back by the CGI script works
just as if it had been stored and read in from an HTML file; it can
define a simple response page or a brand new form coded to collect
additional information.
In other words, CGI scripts are something like callback handlers for requests generated by web browsers that require a program to be run dynamically; they are automatically run on the server machine in response to actions in a browser. Although CGI scripts ultimately receive and send standard structured messages over sockets, CGI is more like a higher-level procedural convention for sending and receiving information between a browser and a server.
If all of the above sounds complicated, relax -- Python, as well as the resident HTTP server, automates most of the tricky bits. CGI scripts are written as fairly autonomous programs, and they assume that startup tasks have already been accomplished. The HTTP web server program, not the CGI script, implements the server-side of the HTTP protocol itself. Moreover, Python’s library modules automatically dissect information sent up from the browser and give it to the CGI script in an easily digested form. The upshot is that CGI scripts may focus on application details like processing input data and producing a result page.
As mentioned earlier, in the context of
CGI scripts, the stdin
and
stdout
streams are automatically tied to sockets
connected to the browser. In addition, the HTTP server passes some
browser information to the CGI script in the form of shell
environment variables. To CGI programmers, that means:
Input data sent from the browser to the server
shows up as a stream of bytes in the stdin
input
stream, along with shell environment variables.
Output is sent back from the server to the
client by simply printing properly formatted HTML to the
stdout
output stream.
The most complex parts of this scheme include parsing all the input information sent up from the browser and formatting information in the reply sent back. Happily, Python’s standard library largely automates both tasks:
With the Python
cgi
module, inputs typed into a web browser form
or appended to a URL string show up as values in a dictionary-like
object in Python CGI scripts. Python parses the data itself and gives
us an object with one key:value
pair per input
sent by the browser that is fully independent of transmission style
(form or URL).
The cgi
module also has tools for automatically
escaping strings so that they are legal to use in HTML (e.g.,
replacing embedded <
, >
,
and &
characters with HTML escape codes).
Module urllib
provides other tools for formatting
text inserted into generated URL strings (e.g., adding
%XX
and +
escapes).
We’ll study both of these interfaces in detail later in this chapter. For now, keep in mind that although any language can be used to write CGI scripts, Python’s standard modules and language attributes make it a snap.
Less happily, CGI scripts are also intimately tied to the syntax of HTML, since they must generate it to create a reply page. In fact, it can be said that Python CGI scripts embed HTML, which is an entirely distinct language in its own right. As we’ll also see, the fact that CGI scripts create a user interface by printing HTML syntax means that we have to take special care with the text we insert into a web page’s code (e.g., escaping HTML operators). Worse, CGI scripts require at least a cursory knowledge of HTML forms, since that is where the inputs and target script’s address are typically specified. This book won’t teach HTML in-depth; if you find yourself puzzled by some of the arcane syntax of the HTML generated by scripts here, you should glance at an HTML introduction, such as O’Reilly’s HTML and XHTML: The Definitive Guide.
Like GUIs, web-based systems are highly interactive, and the best way to get a feel for some of these examples is to test-drive them live. Before we get into some code, it’s worth noting that all you need to run the examples in the next few chapters is a web browser. That is, all the Web examples we will see here can be run from any web browser on any machine, whether you’ve installed Python on that machine or not. Simply type this URL at the top:[92]
http://starship.python.net/~lutz/PyInternetDemos.html
That address loads a launcher page with links to all the example files installed on a server machine whose domain name is starship.python.net (a machine dedicated to Python developers). The launcher page itself appears as shown in Figure 12-1, running under Internet Explorer. It looks similar in other browsers. Each major example has a link on this page, which runs when clicked.
The launcher page, and all the HTML files in this chapter, can also be loaded locally, from the book’s example distribution directory on your machine. They can even be opened directly off the book’s CD (view CD-ROM content online at http://examples.oreilly.com/python2)and may be opened by buttons on the top-level book demo launchers. However, the CGI scripts ultimately invoked by some of the example links must be run on a server, and thus require a live Internet connection. If you browse root pages locally on your machine, your browser will either display the scripts’ source code or tell you when you need to connect to the Web to run a CGI script. On Windows, a connection dialog will likely pop up automatically, if needed.
Of course, running scripts in your browser isn’t quite the same as writing scripts on your own. If you do decide to change these CGI programs or write new ones from scratch, you must be able to access web server machines:
To change server-side scripts, you need an account on a web server machine with an installed version of Python. A basic account on such a server is often enough. Then edit scripts on your machine and upload to the server by FTP.
To type explicit command lines on a server machine or edit scripts on the server directly, you will need to also have shell access on the web server. Such access lets you telnet to that machine to get a command-line prompt.
Unlike the last chapter’s examples, Python server-side scripts require both Python and a server. That is, you’ll need access to a web server machine that supports CGI scripts in general and that either already has an installed Python interpreter or lets you install one of your own. Some Internet Service Providers (ISPs) are more supportive than others on this front, but there are many options here, both commercial and free (more on this later).
Once you’ve located a server to host your scripts, you may
modify and upload the CGI source code file from this book’s CD
to your own server and site by FTP. If you do, you may also want to
run two Python command-line scripts on your server after uploading:
fixcgi.py
and
fixsitename.py
, both presented later in this
chapter. The former sets CGI script permissions, and the latter
replaces any starship
server name references in
example links and forms with your own server’s name.
We’ll study additional installation details later in this
chapter, and explore a few custom server options at the end of Chapter 15.
The source code of examples in this part of the book is listed in the text and included on the book’s CD (see http://examples.oreilly.com/python2). In all cases, if you wish to view the source code of an HTML file, or the HTML generated by a Python CGI script, you can also simply select your browser’s View Source menu option while the corresponding web page is displayed.
Keep in mind, though, that your browser’s View Source option lets you see the output of a server-side script after it has run, but not the source code of the script itself. There is no automatic way to view the Python source code of the CGI scripts themselves, short of finding them in this book or its CD.
To address this issue, later in this chapter we’ll also write a
CGI-based program called getfile
, which allows the
source code of any file on this book’s web site (HTML, CGI
script, etc.) to be downloaded and viewed. Simply type the desired
file’s name into a web page form referenced by the getfile.html link on the Internet demos
launcher page, or add it to the end of an explicitly typed URL as a
parameter like this:
http://.../getfile.cgi?filename=somefile.cgi
In response, the server will ship back the text of the named file to your browser. This process requires explicit interface steps, though, and much more knowledge than we’ve gained thus far, so see ahead for details.
[92]
Given that
this edition may not be updated for many years, it’s not
impossible that the server name in this address starship.python.net might change over time.
If this address fails, check the book updates at http://rmi.net/~lutz/about-pp.html to see if
a new examples site address has been posted. The rest of the main
page’s URL will likely be unchanged. Note, though, that some
examples hardcode the starship
host server name
in URLs; these will be fixed on the new server if moved, but not on
your book CD. Run script fixsitename.py
later in
this chapter to change site names automatically.
3.144.34.85