23. Web Programming

Python is widely used when building websites and serves several different roles in this capacity. First, Python scripts are often a useful way to simply generate a set of static HTML pages to be delivered by a web server. For example, a script can be used to take raw content and decorate it with additional features that you typically see on a website (navigation bars, sidebars, advertisements, stylesheets, etc.). This is mainly just a matter of file handling and text processing—topics that have been covered in other sections of the book.

Second, Python scripts are used to generate dynamic content. For example, a website might operate using a standard webserver such as Apache but would use Python scripts to dynamically handle certain kinds of requests. This use of Python is primarily associated with form processing. For example, an HTML page might include a form like this:

Image

Within the form, the ACTION attribute names a Python script 'subscribe.py' that will execute on the server when the form is submitted.

Another common scenario involving dynamic content generation is with AJAX (Asynchronous Javascript and XML). With AJAX, JavaScript event handlers are associated with certain HTML elements on a page. For example, when the mouse hovers over a specific document element, a JavaScript function might execute and send an HTTP request to the webserver that gets processed (possibly by a Python script). When the associated response is received, another JavaScript function executes to process the response data and displays the result. There are many ways in which results might be returned. For example, a server might return data as plaintext, XML, JSON, or any number of other formats. Here is an example HTML document that illustrates one way to implement a hover popup where moving the mouse over selected elements causes a popup window to appear.

Image

Image

In this example, the JavaScript function ShowPopup() initiates a request to a Python script popupdata.py on the server. The result of this script is just a fragment of HTML, which is then displayed in a popup window. Figure 23.1 shows what this might look like in the browser.

Figure 23.1 Possible browser display where the background text is just an ordinary HTML document and the pop-up window is dynamically generated by the popupdata.py script.

Image

Finally, the entire website might run under the control of Python within the context of a framework written in Python. It has been humorously noted that Python has “more web programming frameworks than language keywords.” The topics of web frameworks is far beyond the scope of this book, but http://wiki.python.org/moin/WebFrameworks is a good starting point for finding more information.

The rest of this chapter describes built-in modules related to the low-level interface by which Python interfaces with webservers and frameworks. Topics include CGI scripting, a technique used to access Python from third-party web servers and WSGI, a middleware layer used for writing components that integrate with Python’s various web frameworks.

cgi

The cgi module is used to implement CGI scripts, which are programs typically executed by a webserver when it wants to process user input from a form or generate dynamic content of some kind.

When a request corresponding to a CGI script is submitted, the webserver executes the CGI program as a subprocess. CGI programs receive input from two sources: sys.stdin and environment variables set by the server. The following list details common environment variables set by webservers:

Image

As output, a CGI program writes to standard output sys.stdout. The gory details of CGI programming can be found in a book such as CGI Programming with Perl, 2nd Edition, by Shishir Gundavaram (O’Reilly & Associates, 2000). For our purposes, there are really only two things to know. First, the contents of an HTML form are passed to a CGI program in a sequence of text known as a query string. In Python, the contents of the query string are accessed using the FieldStorage class. For example:

Image

Second, the output of a CGI program consists of two parts: an HTTP header and the raw data (which is typically HTML). A blank line always separates these two components. A simple HTTP header looks like this:

Image

The rest of the output is the raw output. For example:

Image

It is standard practice that HTTP headers are terminated using the Windows line-ending convention of ' '. That is why the ' ' appears in the example. If you need to signal an error, include a special 'Status:' header in the output. For example:

Image

If you need to redirect the client to a different page, create output like this:

Image

Most of the work in the cgi module is performed by creating an instance of the FieldStorage class.

FieldStorage([input [, headers [, outerboundary [, environ [, keep_blank_values [, strict_parsing]]]]]])

Read the contents of a form by reading and parsing the query string passed in an environment variable or standard input. input specifies a file-like object from which form data will be read in a POST request. By default, sys.stdin is used. headers and outerboundary are used internally and should not be given. environ is a dictionary from which CGI environment variables are read. keep_blank_values is a Boolean flag that controls whether blank values are retained or not. By default, it is False. strict_parsing is a Boolean flag that causes an exception to be raised if there is any kind of parsing problem. By default, it is False.

A FieldStorage instance form works similarly to a dictionary. For example, f = form [key] will extract an entry for a given parameter key. An instance f extracted in this manner is either another instance of FieldStorage or an instance of MiniFieldStorage. The following attributes are defined on f:

Image

Values from a form can be extracted using the following methods:

form.getvalue(fieldname [, default])

Returns the value of a given field with the name fieldname. If a field is defined twice, this function will return a list of all values defined. If default is supplied, it specifies the value to return if the field is not present. One caution with this method is that if the same form field name is included twice in the request, the returned value will be a list containing both values. To simplify programming, you can use form.getfirst(), which simply returns the first value found.

form.getfirst(fieldname [, default])

Returns the first value defined for a field with the name fieldname. If default is supplied, it specifies the value to return if the field is not present.

form.getlist(fieldname)

Returns a list of all values defined for fieldname. It always returns a list, even if only one value is defined, and returns an empty list if no values exist.

In addition, the cgi module defines a class, MiniFieldStorage, that contains only the attribute’s name and value. This class is used to represent individual fields of a form passed in the query string, whereas FieldStorage is used to contain multiple fields and multipart data.

Instances of FieldStorage are accessed like a Python dictionary, where the keys are the field names on the form. When accessed in this manner, the objects returned are themselves an instance of FieldStorage for multipart data (content type is 'multipart/form-data') or file uploads, an instance of MiniFieldStorage for simple fields (content type is 'application/x-www-form-urlencoded'), or a list of such instances in cases where a form contains multiple fields with the same name. For example:

Image

If a field represents an uploaded file, accessing the value attribute reads the entire file into memory as a byte string. Because this may consume a large amount of memory on the server, it may be preferable to read uploaded data in smaller pieces by reading from the file attribute directly. For instance, the following example reads uploaded data line by line:

Image

The following utility functions are often used in CGI scripts:

escape(s [, quote])

Converts the characters '&', '<', and '>' in string s to HTML-safe sequences such as '&amp;', '&lt;', and '&gt;'. If the optional flag quote is true, the double-quote character (") is also translated to '&quot;'.

parse_header(string)

Parses the data supplied after an HTTP header field such as 'content-type'. The data is split into a primary value and a dictionary of secondary parameters that are returned in a tuple. For example, the command

parse_header('text/html; a=hello; b="world"')

returns this result:

Image

Parses input of type 'multipart/form-data' as is commonly used with file uploads. fp is the input file, and pdict is a dictionary containing parameters of the content-type header. It returns a dictionary mapping field names to lists of values. This function doesn’t work with nested multipart data. The FieldStorage class should be used instead.

print_directory()

Formats the name of the current working directory in HTML and prints it out. The resulting output will be sent back to the browser, which can be useful for debugging.

print_environ()

Creates a list of all environment variables formatted in HTML and is used for debugging.

print_environ_usage()

Prints a more selected list of useful environment variables in HTML and is used for debugging.

print_form(form)

Formats the data supplied on a form in HTML. form must be an instance of FieldStorage. Used for debugging.

test()

Writes a minimal HTTP header and prints all the information provided to the script in HTML format. Primarily used for debugging to make sure your CGI environment is set up correctly.

CGI Programming Advice

In the current age of web frameworks, CGI scripting seems to have fallen out of fashion. However, if you are going to use it, there are a couple of programming tips that can simplify your life.

First, don’t write CGI scripts where you are using a huge number of print statements to produce hard-coded HTML output. The resulting program will be a horrible tangled mess of Python and HTML that is not only impossible to read, but also impossible to maintain. A better approach is to rely on templates. Minimally, the string.Template object can be used for this. Here is an example that outlines the concept:

Image

In this example, the files 'error.html' and 'success.html' are HTML pages that have all of the output but include $variable substitutions corresponding to dynamically generated values used in the CGI script. For example, the 'success.html' file might look like this:

Image

The temp.substitute() operation in the script is simply filling in the variables in this file. An obvious benefit of this approach is that if you want to change the appearance of the output, you just modify the template files, not the CGI script. There are many third-party template engines available for Python—maybe in even greater numbers than web frameworks. These take the templating concept and build upon it in substantial ways. See http://wiki.python.org/moin/Templating more details.

Second, if you need to save data from a CGI script, try to use a database. Although it is easy enough to write data directly to files, webservers operate concurrently, and unless you’ve taken steps to properly lock and synchronize resources, it is possible that files will get corrupted. Database servers and their associated Python interface usually don’t have this problem. So if you need to save data, try to use a module such as sqlite3 or a third-party module for something like MySQL.

Finally, if you find yourself writing dozens of CGI scripts and code that has to deal with low-level details of HTTP such as cookies, authentication, encoding, and so forth, you may want to consider a web framework instead. The whole point of using a framework is so that you don’t have to worry about those details—well, at least not as much. So, don’t reinvent the wheel.

Notes

• The process of installing a CGI program varies widely according to the type of webserver being used. Typically programs are placed in a special cgi-bin directory. A server may also require additional configuration. You should consult the documentation for the server or the server’s administrator for more details.

• On UNIX, Python CGI programs may require appropriate execute permissions to be set and a line such as the following to appear as the first line of the program:

Image

• To simplify debugging, import the cgitb module—for example, import cgitb; cgitb.enable(). This modifies exception handling so that errors are displayed in the web browser.

• If you invoke an external program—for example, via the os.system() or os.popen() function—be careful not to pass arbitrary strings received from the client to the shell. This is a well-known security hole that hackers can use to execute arbitrary shell commands on the server (because the command passed to these functions is first interpreted by the UNIX shell as opposed to being executed directly). In particular, never pass any part of a URL or form data to a shell command unless it has first been thoroughly checked by making sure that the string contains only alphanumeric characters, dashes, underscores, and periods.

• On UNIX, don’t give a CGI program setuid mode. This is a security liability and not supported on all machines.

• Don’t use 'from cgi import *' with this module. The cgi module defines a wide variety of names and symbols that you probably don’t want in your namespace.

cgitb

This module provides an alternative exception handler that displays a detailed report whenever an uncaught exception occurs. The report contains source code, values of parameters, and local variables. Originally, this module was developed to help debug CGI scripts, but it can be used in any application.

enable([display [, logdir [, context [, format]]]])

Enables special exception handling. display is a flag that determines whether any information is displayed when an error occurs. The default value is 1. logdir specifies a directory in which error reports will be written to files instead of printed to standard output. When logdir is given, each error report is written to a unique file created by the tempfile.mkstemp() function. context is an integer specifying the number of lines of source code to display around lines upon which the exception occurred. format is a string that specifies the output format. A format of 'html' specifies HTML (the default). Any other value results in plain-text format.

handle([info])

Handles an exception using the default settings of the enable() function. info is a tuple (exctype, excvalue, tb) where exctype is an exception type, excvalue is an exception value, and tb is a traceback object. This tuple is normally obtained using sys.exc_info(). If info is omitted, the current exception is used.

Note

To enable special exception handling in CGI scripts, include the line import cgitb; enable() at the beginning of the script.

wsgiref

WSGI (Python Web Server Gateway Interface) is a standardized interface between webservers and web applications that is designed to promote portability of applications across different webservers and frameworks. An official description of the standard is found in PEP 333 (http://www.python.org/dev/peps/pep-0333). More information about the standard and its use can also be found at http://www.wsgi.org. The wsgiref package is a reference implementation that can be used for testing, validation, and simple deployments.

The WSGI Specification

With WSGI, a web application is implemented as a function or callable object webapp(environ, start_response) that accepts two arguments. environ is a dictionary of environment settings that is minimally required to have the following values which have the same meaning and names as is used in CGI scripting:

Image

In addition, the environ dictionary is required to contain the following WSGI-specific values:

Image

The start_response parameter is a callable object of the form start_response(status, headers) that is used by the application to start a response. status is a string such as '200 OK' or '404 Not Found'. headers is a list of tuples, each of the form (name, value) corresponding to a HTTP header to be included in the response—for example, ('Content-type','text/html').

The data or body of a response is returned by the web application function as an iterable object that produces a sequence of byte strings or text strings that only contain characters which can be encoded as a single byte (e.g., compatible with the ISO-8859-1 or Latin-1 character set). Examples include a list of byte strings or a generator function producing byte strings. If an application needs to do any kind of character encoding such as UTF-8, it must do this itself.

Here is an example of a simple WSGI application that reads form fields and produces some output, similar to what was shown in the cgi module section:

Image

There are a few critical details in this example. First, WSGI application components are not tied to specific framework, webserver, or set of library modules. In the example, we’re only using one library module, cgi, simply because it has some convenience functions for parsing query variables. The example shows how the start_response() function is used to initiate a response and supply headers. The response itself is constructed as a list of strings. The final statement in this application is a generator expression that turns all strings into byte strings. If you’re using Python 3, this is a critical step—all WSGI applications are expected to return encoded bytes, not unencoded Unicode data.

To deploy a WSGI application, it has to be registered with the web programming framework you happen to be using. For this, you’ll have to read the manual.

wsgiref Package

The wsgiref package provides a reference implementation of the WSGI standard that allows applications to be tested in stand-alone servers or executed as normal CGI scripts.

wsgiref.simple_server

The wsgiref.simple_server module implements a simple stand-alone HTTP server that runs a single WSGI application. There are just two functions of interest:

make_server(host, port, app)

Creates an HTTP server that accepts connections on the given host name host and port number port. app is a function or callable object that implements a WSGI application. To run the server, use s.serve_forever() where s is an instance of the server that is returned.

demo_app(environ, start_response)

A complete WSGI application that returns a page with a “Hello World” message on it. This can be used as the app argument to make_server() to verify that the server is working correctly.

Here is an example of running a simple WSGI server:

Image

wsgiref.handlers

The wsgiref.handlers module contains handler objects for setting up a WSGI execution environment so that applications can run within another webserver (e.g., CGI scripting under Apache). There are few different objects.

CGIHandler()

Creates a WSGI handler object that runs inside a standard CGI environment. This handler collects information from the standard environment variables and I/O streams as described in the cgi library module.

BaseCGIHandler(stdin, stdout, stderr, environ [, multithread [, multiprocess]])

Creates a WSGI handler that operates within a CGI environment, but where the standard I/O streams and environment variables might be set up in a different way. stdin, stdout, and stderr specify file-like objects for the standard I/O streams. environ is a dictionary of environment variables that is expected to already contain the standard CGI environment variables. multithread and multiprocess are Boolean flags that are used to set the wsgi.multithread and wsgi.multiprocess environment variables. By default, multithread is True and multiprocess is False.

SimpleHandler(stdin, stdout, stderr, environ [, multithread [, multiprocess]])

Creates a WSGI handler that is similar to BaseCGIHandler, but which gives the underlying application direct access to stdin, stdout, stderr, and environ. This is slightly different than BaseCGIHandler that provides extra logic to process certain features correctly (e.g., in BaseCGIHandler, response codes are translated into Status: headers).

All of these handlers have a method run(app) that is used to run a WSGI application within the handler. Here is an example of a WSGI application running as a traditional CGI script:

Image

wsgiref.validate

The wsgiref.validate module has a function that wraps a WSGI application with a validation wrapper to ensure that both it and the server are operating according to the standard.

validator(app)

Creates a new WSGI application that wraps the WSGI application app. The new application transparently works in the same way as app except that extensive error-checking is added to make sure the application and the server are following the WSGI standard. Any violation results in an AssertionError exception.

Here is an example of using the validator:

Image

Note

The material in this section is primarily aimed at users of WSGI who want to create application objects. If, on the other hand, you are implementing yet another web framework for Python, you should consult PEP 333 for official details on precisely what is needed to make your framework support WSGI. If you are using a third-party web framework, you will need to consult the framework documentation for details concerning its support for WSGI objects. Given that WSGI is an officially blessed specification with a reference implementation in the standard library, it is increasingly common for frameworks to provide some level of support for it.

webbrowser

The webbrowser module provides utility functions for opening documents in a web browser in a platform-independent manner. The main use of this module is in development and testing situations. For example, if you wrote a script that generated HTML output, you could use the functions in this module to automatically direct your system’s browser to view the results.

open(url [, new [, autoraise]])

Displays url with the default browser on the system. If new is 0, the URL is opened in the same window as a running browser, if possible. If new is 1, a new browser window is created. If new is 2, the URL is opened within a new tab within the browser. If autoraise is True, the browser window is raised.

open_new(url)

Displays url in a new window of the default browser. The same as open(url, 1).

open_new_tab(url)

Displays url in a new tab of the default browser. The same as open(url, 2).

get([name])

Returns a controller object for manipulating a browser. name is the name of the browser type and is typically a string such as 'netscape', 'mozilla', 'kfm', 'grail', 'windows-default', 'internet-config', or 'command-line'. The returned controller object has methods open() and open_new() that accept the same arguments and perform the same operation as the two previous functions. If name is omitted, a controller object for the default browser is returned.

register(name, constructor[, controller])

Registers a new browser type for use with the get() function. name is the name of the browser. constructor is called without arguments to create a controller object for opening pages in the browser. controller is a controller instance to use instead. If supplied, constructor is ignored and may be None.

A controller instance, c, returned by the get() function has the following methods:

c.open(url[, new])

Same as the open() function.

c.open_new(url)

Same as the open_new() function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.7.144