Python is widely used when building websites and serves several different roles in this capacity. First, Python scripts are often a useful way to simply generate a set of static HTML pages to be delivered by a web server. For example, a script can be used to take raw content and decorate it with additional features that you typically see on a website (navigation bars, sidebars, advertisements, stylesheets, etc.). This is mainly just a matter of file handling and text processing—topics that have been covered in other sections of the book.
Second, Python scripts are used to generate dynamic content. For example, a website might operate using a standard webserver such as Apache but would use Python scripts to dynamically handle certain kinds of requests. This use of Python is primarily associated with form processing. For example, an HTML page might include a form like this:
Within the form, the ACTION
attribute names a Python script 'subscribe.py'
that will execute on the server when the form is submitted.
Another common scenario involving dynamic content generation is with AJAX (Asynchronous Javascript and XML). With AJAX, JavaScript event handlers are associated with certain HTML elements on a page. For example, when the mouse hovers over a specific document element, a JavaScript function might execute and send an HTTP request to the webserver that gets processed (possibly by a Python script). When the associated response is received, another JavaScript function executes to process the response data and displays the result. There are many ways in which results might be returned. For example, a server might return data as plaintext, XML, JSON, or any number of other formats. Here is an example HTML document that illustrates one way to implement a hover popup where moving the mouse over selected elements causes a popup window to appear.
In this example, the JavaScript function ShowPopup()
initiates a request to a Python script popupdata.py
on the server. The result of this script is just a fragment of HTML, which is then displayed in a popup window. Figure 23.1 shows what this might look like in the browser.
Finally, the entire website might run under the control of Python within the context of a framework written in Python. It has been humorously noted that Python has “more web programming frameworks than language keywords.” The topics of web frameworks is far beyond the scope of this book, but http://wiki.python.org/moin/WebFrameworks is a good starting point for finding more information.
The rest of this chapter describes built-in modules related to the low-level interface by which Python interfaces with webservers and frameworks. Topics include CGI scripting, a technique used to access Python from third-party web servers and WSGI, a middleware layer used for writing components that integrate with Python’s various web frameworks.
cgi
The cgi
module is used to implement CGI scripts, which are programs typically executed by a webserver when it wants to process user input from a form or generate dynamic content of some kind.
When a request corresponding to a CGI script is submitted, the webserver executes the CGI program as a subprocess. CGI programs receive input from two sources: sys.stdin
and environment variables set by the server. The following list details common environment variables set by webservers:
As output, a CGI program writes to standard output sys.stdout
. The gory details of CGI programming can be found in a book such as CGI Programming with Perl, 2nd Edition, by Shishir Gundavaram (O’Reilly & Associates, 2000). For our purposes, there are really only two things to know. First, the contents of an HTML form are passed to a CGI program in a sequence of text known as a query string. In Python, the contents of the query string are accessed using the FieldStorage
class. For example:
Second, the output of a CGI program consists of two parts: an HTTP header and the raw data (which is typically HTML). A blank line always separates these two components. A simple HTTP header looks like this:
The rest of the output is the raw output. For example:
It is standard practice that HTTP headers are terminated using the Windows line-ending convention of '
'
. That is why the '
'
appears in the example. If you need to signal an error, include a special 'Status:'
header in the output. For example:
If you need to redirect the client to a different page, create output like this:
Most of the work in the cgi
module is performed by creating an instance of the FieldStorage
class.
FieldStorage([input [, headers [, outerboundary [, environ [, keep_blank_values [, strict_parsing]]]]]])
Read the contents of a form by reading and parsing the query string passed in an environment variable or standard input. input
specifies a file-like object from which form data will be read in a POST
request. By default, sys.stdin
is used. headers
and outerboundary
are used internally and should not be given. environ
is a dictionary from which CGI environment variables are read. keep_blank_values
is a Boolean flag that controls whether blank values are retained or not. By default, it is False
. strict_parsing
is a Boolean flag that causes an exception to be raised if there is any kind of parsing problem. By default, it is False
.
A FieldStorage
instance form
works similarly to a dictionary. For example, f
=
form
[
key
]
will extract an entry for a given parameter key
. An instance f
extracted in this manner is either another instance of FieldStorage
or an instance of MiniFieldStorage
. The following attributes are defined on f
:
Values from a form can be extracted using the following methods:
form.getvalue(fieldname [, default])
Returns the value of a given field with the name fieldname
. If a field is defined twice, this function will return a list of all values defined. If default
is supplied, it specifies the value to return if the field is not present. One caution with this method is that if the same form field name is included twice in the request, the returned value will be a list containing both values. To simplify programming, you can use form
.getfirst()
, which simply returns the first value found.
form.getfirst(fieldname [, default])
Returns the first value defined for a field with the name fieldname
. If default
is supplied, it specifies the value to return if the field is not present.
form.getlist(fieldname)
Returns a list of all values defined for fieldname
. It always returns a list, even if only one value is defined, and returns an empty list if no values exist.
In addition, the cgi
module defines a class, MiniFieldStorage
, that contains only the attribute’s name and value. This class is used to represent individual fields of a form passed in the query string, whereas FieldStorage
is used to contain multiple fields and multipart data.
Instances of FieldStorage
are accessed like a Python dictionary, where the keys are the field names on the form. When accessed in this manner, the objects returned are themselves an instance of FieldStorage
for multipart data (content type is 'multipart/form-data'
) or file uploads, an instance of MiniFieldStorage
for simple fields (content type is 'application/x-www-form-urlencoded'
), or a list of such instances in cases where a form contains multiple fields with the same name. For example:
If a field represents an uploaded file, accessing the value
attribute reads the entire file into memory as a byte string. Because this may consume a large amount of memory on the server, it may be preferable to read uploaded data in smaller pieces by reading from the file
attribute directly. For instance, the following example reads uploaded data line by line:
The following utility functions are often used in CGI scripts:
escape(s [, quote])
Converts the characters '&'
, '<'
, and '>'
in string s
to HTML-safe sequences such as '&'
, '<'
, and '>'
. If the optional flag quote
is true, the double-quote character ("
) is also translated to '"'
.
parse_header(string)
Parses the data supplied after an HTTP header field such as 'content-type'
. The data is split into a primary value and a dictionary of secondary parameters that are returned in a tuple. For example, the command
parse_header('text/html; a=hello; b="world"')
returns this result:
Parses input of type 'multipart/form-data'
as is commonly used with file uploads. fp
is the input file, and pdict
is a dictionary containing parameters of the content-type header. It returns a dictionary mapping field names to lists of values. This function doesn’t work with nested multipart data. The FieldStorage
class should be used instead.
Formats the name of the current working directory in HTML and prints it out. The resulting output will be sent back to the browser, which can be useful for debugging.
print_environ()
Creates a list of all environment variables formatted in HTML and is used for debugging.
print_environ_usage()
Prints a more selected list of useful environment variables in HTML and is used for debugging.
print_form(form)
Formats the data supplied on a form in HTML. form
must be an instance of FieldStorage
. Used for debugging.
test()
Writes a minimal HTTP header and prints all the information provided to the script in HTML format. Primarily used for debugging to make sure your CGI environment is set up correctly.
In the current age of web frameworks, CGI scripting seems to have fallen out of fashion. However, if you are going to use it, there are a couple of programming tips that can simplify your life.
First, don’t write CGI scripts where you are using a huge number of print
statements to produce hard-coded HTML output. The resulting program will be a horrible tangled mess of Python and HTML that is not only impossible to read, but also impossible to maintain. A better approach is to rely on templates. Minimally, the string.Template
object can be used for this. Here is an example that outlines the concept:
In this example, the files 'error.html'
and 'success.html'
are HTML pages that have all of the output but include $
variable
substitutions corresponding to dynamically generated values used in the CGI script. For example, the 'success.html'
file might look like this:
The temp.substitute()
operation in the script is simply filling in the variables in this file. An obvious benefit of this approach is that if you want to change the appearance of the output, you just modify the template files, not the CGI script. There are many third-party template engines available for Python—maybe in even greater numbers than web frameworks. These take the templating concept and build upon it in substantial ways. See http://wiki.python.org/moin/Templating more details.
Second, if you need to save data from a CGI script, try to use a database. Although it is easy enough to write data directly to files, webservers operate concurrently, and unless you’ve taken steps to properly lock and synchronize resources, it is possible that files will get corrupted. Database servers and their associated Python interface usually don’t have this problem. So if you need to save data, try to use a module such as sqlite3
or a third-party module for something like MySQL.
Finally, if you find yourself writing dozens of CGI scripts and code that has to deal with low-level details of HTTP such as cookies, authentication, encoding, and so forth, you may want to consider a web framework instead. The whole point of using a framework is so that you don’t have to worry about those details—well, at least not as much. So, don’t reinvent the wheel.
• The process of installing a CGI program varies widely according to the type of webserver being used. Typically programs are placed in a special cgi-bin
directory. A server may also require additional configuration. You should consult the documentation for the server or the server’s administrator for more details.
• On UNIX, Python CGI programs may require appropriate execute permissions to be set and a line such as the following to appear as the first line of the program:
• To simplify debugging, import the cgitb
module—for example, import cgitb; cgitb.enable()
. This modifies exception handling so that errors are displayed in the web browser.
• If you invoke an external program—for example, via the os.system()
or os.popen()
function—be careful not to pass arbitrary strings received from the client to the shell. This is a well-known security hole that hackers can use to execute arbitrary shell commands on the server (because the command passed to these functions is first interpreted by the UNIX shell as opposed to being executed directly). In particular, never pass any part of a URL or form data to a shell command unless it has first been thoroughly checked by making sure that the string contains only alphanumeric characters, dashes, underscores, and periods.
• On UNIX, don’t give a CGI program setuid
mode. This is a security liability and not supported on all machines.
• Don’t use 'from cgi import *'
with this module. The cgi
module defines a wide variety of names and symbols that you probably don’t want in your namespace.
cgitb
This module provides an alternative exception handler that displays a detailed report whenever an uncaught exception occurs. The report contains source code, values of parameters, and local variables. Originally, this module was developed to help debug CGI scripts, but it can be used in any application.
enable([display [, logdir [, context [, format]]]])
Enables special exception handling. display
is a flag that determines whether any information is displayed when an error occurs. The default value is 1
. logdir
specifies a directory in which error reports will be written to files instead of printed to standard output. When logdir
is given, each error report is written to a unique file created by the tempfile.mkstemp()
function. context
is an integer specifying the number of lines of source code to display around lines upon which the exception occurred. format
is a string that specifies the output format. A format of 'html'
specifies HTML (the default). Any other value results in plain-text format.
handle([info])
Handles an exception using the default settings of the enable()
function. info
is a tuple (
exctype
,
excvalue
,
tb)
where exctype
is an exception type, excvalue
is an exception value, and tb
is a traceback object. This tuple is normally obtained using sys.exc_info()
. If info
is omitted, the current exception is used.
To enable special exception handling in CGI scripts, include the line import cgitb; enable()
at the beginning of the script.
wsgiref
WSGI (Python Web Server Gateway Interface) is a standardized interface between webservers and web applications that is designed to promote portability of applications across different webservers and frameworks. An official description of the standard is found in PEP 333 (http://www.python.org/dev/peps/pep-0333). More information about the standard and its use can also be found at http://www.wsgi.org. The wsgiref
package is a reference implementation that can be used for testing, validation, and simple deployments.
With WSGI, a web application is implemented as a function or callable object webapp
(
environ
,
start_response
)
that accepts two arguments. environ
is a dictionary of environment settings that is minimally required to have the following values which have the same meaning and names as is used in CGI scripting:
In addition, the environ
dictionary is required to contain the following WSGI-specific values:
The start_response
parameter is a callable object of the form start_response
(
status
, headers
)
that is used by the application to start a response. status
is a string such as '200 OK'
or '404 Not Found'
. headers
is a list of tuples, each of the form (
name
,
value
)
corresponding to a HTTP header to be included in the response—for example, ('Content-type','text/html')
.
The data or body of a response is returned by the web application function as an iterable object that produces a sequence of byte strings or text strings that only contain characters which can be encoded as a single byte (e.g., compatible with the ISO-8859-1 or Latin-1 character set). Examples include a list of byte strings or a generator function producing byte strings. If an application needs to do any kind of character encoding such as UTF-8, it must do this itself.
Here is an example of a simple WSGI application that reads form fields and produces some output, similar to what was shown in the cgi
module section:
There are a few critical details in this example. First, WSGI application components are not tied to specific framework, webserver, or set of library modules. In the example, we’re only using one library module, cgi
, simply because it has some convenience functions for parsing query variables. The example shows how the start_response()
function is used to initiate a response and supply headers. The response itself is constructed as a list of strings. The final statement in this application is a generator expression that turns all strings into byte strings. If you’re using Python 3, this is a critical step—all WSGI applications are expected to return encoded bytes, not unencoded Unicode data.
To deploy a WSGI application, it has to be registered with the web programming framework you happen to be using. For this, you’ll have to read the manual.
wsgiref
PackageThe wsgiref
package provides a reference implementation of the WSGI standard that allows applications to be tested in stand-alone servers or executed as normal CGI scripts.
wsgiref.simple_server
The wsgiref.simple_server
module implements a simple stand-alone HTTP server that runs a single WSGI application. There are just two functions of interest:
make_server(host, port, app)
Creates an HTTP server that accepts connections on the given host name host
and port number port
. app
is a function or callable object that implements a WSGI application. To run the server, use s
.serve_forever()
where s
is an instance of the server that is returned.
demo_app(environ, start_response)
A complete WSGI application that returns a page with a “Hello World” message on it. This can be used as the app
argument to make_server()
to verify that the server is working correctly.
Here is an example of running a simple WSGI server:
wsgiref.handlers
The wsgiref.handlers
module contains handler objects for setting up a WSGI execution environment so that applications can run within another webserver (e.g., CGI scripting under Apache). There are few different objects.
CGIHandler()
Creates a WSGI handler object that runs inside a standard CGI environment. This handler collects information from the standard environment variables and I/O streams as described in the cgi
library module.
BaseCGIHandler(stdin, stdout, stderr, environ [, multithread [, multiprocess]])
Creates a WSGI handler that operates within a CGI environment, but where the standard I/O streams and environment variables might be set up in a different way. stdin
, stdout
, and stderr
specify file-like objects for the standard I/O streams. environ
is a dictionary of environment variables that is expected to already contain the standard CGI environment variables. multithread
and multiprocess
are Boolean flags that are used to set the wsgi.multithread
and wsgi.multiprocess
environment variables. By default, multithread
is True
and multiprocess
is False
.
SimpleHandler(stdin, stdout, stderr, environ [, multithread [, multiprocess]])
Creates a WSGI handler that is similar to BaseCGIHandler
, but which gives the underlying application direct access to stdin
, stdout
, stderr
, and environ
. This is slightly different than BaseCGIHandler
that provides extra logic to process certain features correctly (e.g., in BaseCGIHandler
, response codes are translated into Status:
headers).
All of these handlers have a method run(
app
)
that is used to run a WSGI application within the handler. Here is an example of a WSGI application running as a traditional CGI script:
wsgiref.validate
The wsgiref.validate
module has a function that wraps a WSGI application with a validation wrapper to ensure that both it and the server are operating according to the standard.
validator(app)
Creates a new WSGI application that wraps the WSGI application app
. The new application transparently works in the same way as app
except that extensive error-checking is added to make sure the application and the server are following the WSGI standard. Any violation results in an AssertionError
exception.
Here is an example of using the validator:
The material in this section is primarily aimed at users of WSGI who want to create application objects. If, on the other hand, you are implementing yet another web framework for Python, you should consult PEP 333 for official details on precisely what is needed to make your framework support WSGI. If you are using a third-party web framework, you will need to consult the framework documentation for details concerning its support for WSGI objects. Given that WSGI is an officially blessed specification with a reference implementation in the standard library, it is increasingly common for frameworks to provide some level of support for it.
webbrowser
The webbrowser
module provides utility functions for opening documents in a web browser in a platform-independent manner. The main use of this module is in development and testing situations. For example, if you wrote a script that generated HTML output, you could use the functions in this module to automatically direct your system’s browser to view the results.
open(url [, new [, autoraise]])
Displays url
with the default browser on the system. If new
is 0, the URL is opened in the same window as a running browser, if possible. If new
is 1, a new browser window is created. If new
is 2, the URL is opened within a new tab within the browser. If autoraise
is True
, the browser window is raised.
open_new(url)
Displays url
in a new window of the default browser. The same as open(
url
, 1
).
open_new_tab(url)
Displays url
in a new tab of the default browser. The same as open(
url
, 2)
.
get([name])
Returns a controller object for manipulating a browser. name
is the name of the browser type and is typically a string such as 'netscape'
, 'mozilla'
, 'kfm'
, 'grail'
, 'windows-default'
, 'internet-config'
, or 'command-line'
. The returned controller object has methods open()
and open_new()
that accept the same arguments and perform the same operation as the two previous functions. If name
is omitted, a controller object for the default browser is returned.
register(name, constructor[, controller])
Registers a new browser type for use with the get()
function. name
is the name of the browser. constructor
is called without arguments to create a controller object for opening pages in the browser. controller
is a controller instance to use instead. If supplied, constructor
is ignored and may be None
.
A controller instance, c
, returned by the get()
function has the following methods:
c.open(url[, new])
Same as the open()
function.
c.open_new(url)
Same as the open_new()
function.
3.145.32.73