Chapter 5. CGI.pm

The CGI.pm module has become the standard tool for creating CGI scripts in Perl. It provides a simple interface for most of the common CGI tasks. Not only does it easily parse input parameters, but it also provides a clean interface for outputting headers and a powerful yet elegant way to output HTML code from your scripts.

We will cover most of the basics here and will revisit CGI.pm later to look at some of its other features when we discuss other components of CGI programming. For example, CGI.pm provides a simple way to read and write to browser cookies, but we will wait to review that until we get to our discussion about maintaining state, in Chapter 11.

If after reading this chapter you are interested in more information, the author of CGI.pm has written an entire book devoted to it: The Official Guide to Programming with CGI.pm by Lincoln Stein ( John Wiley & Sons).

Because CGI.pm offers so many methods, we’ll organize our discussion of CGI.pm into three parts: handling input, generating output, and handling errors. We will look at ways to generate output both with and without CGI.pm. Here is the structure of our chapter:

  • Handling Input with CGI.pm

    • Information about the environment. CGI.pm has methods that provide information that is similar, but somewhat different from the information available in %ENV.

    • Form input. CGI.pm automatically parses parameters passed to you via HTML forms and provides a simple method for accessing these parameters.

    • File uploads. CGI.pm allows your CGI script to handle HTTP file uploads easily and transparently.

  • Generating Output with CGI.pm

    • Generating headers. CGI.pm has methods to help you output HTTP headers from your CGI script.

    • Generating HTML. CGI.pm allows you to generate full HTML documents via corresponding method calls.

  • Alternatives for Generating Output

    • Quoted HTML and here documents. We will compare alternative strategies for outputting HTML.

  • Handling Errors

    • Trapping die. The standard way to handle errors with Perl, die, does not work cleanly with CGI.

    • CGI::Carp. The CGI::Carp module distributed with CGI.pm makes it easy to trap die and other error conditions that may kill your script.

    • Custom solutions. If you want more control when displaying errors to your users, you may want to create a custom subroutine or module.

Let’s start with a general overview of CGI.pm.

Overview

CGI.pm requires Perl 5.003_07 or higher and has been included with the standard Perl distribution since 5.004. You can check which version of Perl you are running with the -v option:

$ perl -v

This is perl, version 5.005

Copyright 1987-1997, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.

You can verify whether CGI.pm is installed and which version by doing this:

$ perl -MCGI -e 'print "CGI.pm version $CGI::VERSION
";'
CGI.pm version 2.56

If you get something like the following, then you do not have CGI.pm installed, and you will have to download and install it. Appendix B, explains how to do this.

Can't locate CGI.pm in @INC (@INC contains:  /usr/lib/perl5/i386-linux/5.005 /usr/
lib/perl5 /usr/lib/perl5/site_perl/i386-linux /usr/lib/perl5/site_perl .).
BEGIN failed--compilation aborted.

New versions of CGI.pm are released regularly, and most releases include bug fixes.[6] We therefore recommend that you install the latest version and monitor new releases (you can find a version history at the bottom of the cgi_docs.html file distributed with CGI.pm). This chapter discusses features introduced as late as 2.47.

Denial of Service Attacks

Before we get started, you should make a minor change to your copy of CGI.pm. CGI.pm handles HTTP file uploads and automatically saves the contents of these uploads to temporary files. This is a very convenient feature, and we’ll talk about this later. However, file uploads are enabled by default in CGI.pm, and it does not impose any limitations on the size of files it will accept. Thus, it is possible for someone to upload multiple large files to your web server and fill up your disk.

Clearly, the vast majority of your CGI scripts do not accept file uploads. Thus, you should disable this feature and enable it only in those scripts where you wish to use it. You may also wish to limit the size of POST requests, which includes file uploads as well as standard forms submitted via the POST method.

To make these changes, locate CGI.pm in your Perl libraries and then search for text that looks like the following:

# Set this to a positive value to limit the size of a POSTing
# to a certain number of bytes:
$POST_MAX = -1;

# Change this to 1 to disable uploads entirely:
$DISABLE_UPLOADS = 0;

Set $DISABLE_UPLOADS to 1. You may wish to set $POST_MAX to a reasonable upper bound as well, such as 100KB. POST requests that are not file uploads are processed in memory, so restricting the size of POST requests avoids someone submitting multiple large POST requests that quickly use up available memory on your server. The result looks like this:

# Set this to a positive value to limit the size of a POSTing
# to a certain number of bytes:
$POST_MAX = 102_400;  # 100 KB

# Change this to 1 to disable uploads entirely:
$DISABLE_UPLOADS = 1;

If you then want to enable uploads and/or allow a greater size for POST requests, you can override these values in your script by setting $CGI::DISABLE_UPLOADS and $CGI::POST_MAX after you use the CGI.pm module, but before you create a CGI.pm object. We will look at how to receive file uploads later in this chapter.

You may need special permission to update your CGI.pm file. If your system administrator for some reason will not make these changes, then you must disable file uploads and limit POST requests on a script by script basis. Your scripts should begin like this:

#!/usr/bin/perl -wT

use strict;
use CGI;

$CGI::DISABLE_UPLOADS = 1;
$CGI::POST_MAX        = 102_400; # 100 KB

my $q = new CGI;
.
.

Throughout our examples, we will assume that the module has been patched and omit these lines.

The Kitchen Sink

CGI.pm is a big module. It provides functions for accessing CGI environment variables and printing outgoing headers. It automatically interprets form data submitted via POST, via GET, and handles multipart-encoded file uploads. It provides many utility functions to do common CGI-related tasks, and it provides a simple interface for outputting HTML. This interface does not eliminate the need to understand HTML, but it makes including HTML inside a Perl script more natural and easier to validate.

Because CGI.pm is so large, some people consider it bloated and complain that it wastes memory. In fact, it uses many creative ways to increase the efficiency of CGI.pm including a custom implementation of SelfLoader. This means that it loads only code that you need. If you use CGI.pm only to parse input, but do not use it to produce HTML, then CGI.pm does not load the code for producing HTML.

There have also been some alternative, lightweight CGI modules written. One of the lightweight alternatives to CGI.pm was begun by David James; he got together with Lincoln Stein and the result is a new and improved version of CGI.pm that is even smaller, faster, and more modular than the original. It should be available as CGI.pm 3.0 by the time you read this book.

Standard and Object-Oriented Syntax

CGI.pm, like Perl, is powerful yet flexible. It supports two styles of usage: a standard interface and an object-oriented interface. Internally, it is a fully object-oriented module. Not all Perl programmers are comfortable with object-oriented notation, however, so those developers can instead request that CGI.pm make its subroutines available for the developer to call directly.

Here is an example. The object-oriented syntax looks like this:

use strict;
use CGI;

my $q    = new CGI;
my $name = $q->param( "name" );

print $q->header( "text/html" ),
      $q->start_html( "Welcome" ),
      $q->p( "Hi $name!" ),
      $q->end_html;

The standard syntax looks like this:

use strict;
use CGI qw( :standard );

my $name = param( "name" );

print header( "text/html" ),
      start_html( "Welcome" ),
      p( "Hi $name!" ),
      end_html;

Don’t worry about the details of what the code does right now; we will cover all of it during this chapter. The important thing to notice is the different syntax. The first script creates a CGI.pm object and stores it in $q ($q is short for query and is a common convention for CGI.pm objects, although $cgi is used sometimes, too). Thereafter, all the CGI.pm functions are preceded by $q->. The second asks CGI.pm to export the standard functions and simply uses them directly. CGI.pm provides several predefined groups of functions, like :standard , that can be exported into your CGI script.

The standard CGI.pm syntax certainly has less noise. It doesn’t have all those $q-> prefixes. Aesthetics aside, however, there are good arguments for using the object oriented syntax with CGI.pm.

Exporting functions has its costs. Perl maintains a separate namespace for different chunks of code referred to as packages. Most modules, like CGI.pm, load themselves into their own package. Thus, the functions and variables that modules see are different from the modules and variables you see in your scripts. This is a good thing, because it prevents collisions between variables and functions in different packages that happen to have the same name. When a module exports symbols (whether they are variables or functions), Perl has to create and maintain an alias of each of the these symbols in your program’s namespace, the main namespace. These aliases consume memory. This memory usage becomes especially critical if you decide to use your CGI scripts with FastCGI or mod_perl.

The object-oriented syntax also helps you avoid any possible collisions that would occur if you create a subroutine with the same name as one of CGI.pm’s exported subroutines. Also, from a maintenance standpoint, it is clear from looking at the object-oriented script where the code for the header function is: it’s a method of a CGI.pm object, so it must be in the CGI.pm module (or one of its associated modules). Knowing where to look for the header function in the second example is much more difficult, especially if your CGI scripts grow large and complex.

Some people avoid the object-oriented syntax because they believe it is slower. In Perl, methods typically are slower than functions. However, CGI.pm is truly an object-oriented module at heart, and in order to provide the function syntax, it must do some fancy footwork to manage an object for you internally. Thus with CGI.pm, the object-oriented syntax is not any slower than the function syntax. In fact, it can be slightly faster.

We will use CGI.pm’s object-oriented syntax in most of our examples.

Handling Input with CGI.pm

CGI.pm primarily handles two separate tasks: it reads and parses input from the user, and it provides a convenient way to return HTML output. Let’s first look at how it collects input.

Environment Information

CGI.pm provides many methods to get information about your environment. Of course, when you use CGI.pm, all of your standard CGI environment variables are still available in Perl’s %ENV hash, but CGI.pm also makes most of these available via method calls. It also provides some unique methods. Table 5.1 shows how CGI.pm’s functions correspond to the standard CGI environment variables.

Table 5-1. CGI.pm Environment Methods and CGI Environment Variables

CGI.pm Method

CGI Environment Variable

auth_type

AUTH_TYPE

Not available

CONTENT_LENGTH

content_type

CONTENT_TYPE

Not available

DOCUMENT_ROOT

Not available

GATEWAY_INTERFACE

path_info

PATH_INFO

path_translated

PATH_TRANSLATED

query_string

QUERY_STRING

remote_addr

REMOTE_ADDR

remote_host

REMOTE_HOST

remote_ident

REMOTE_IDENT

remote_user

REMOTE_USER

request_method

REQUEST_METHOD

script_name

SCRIPT_NAME

self_url

Not available

server_name

SERVER_NAME

server_port

SERVER_PORT

server_protocol

SERVER_PROTOCOL

server_software

SERVER_SOFTWARE

url

Not available

Accept

HTTP_ACCEPT

http("Accept-charset")

HTTP_ACCEPT_CHARSET

http("Accept-encoding")

HTTP_ACCEPT_ENCODING

http("Accept-language")

HTTP_ACCEPT_LANGUAGE

raw_cookie

HTTP_COOKIE

http("From")

HTTP_FROM

virtual_host

HTTP_HOST

referer

HTTP_REFERER

user_agent

HTTP_USER_AGENT

https

HTTPS

https("Cipher")

HTTPS_CIPHER

https("Keysize")

HTTPS_KEYSIZE

https("SecretKeySize")

HTTPS_SECRETKEYSIZE

Most of these CGI.pm methods take no arguments and return that same value as the corresponding environment variable. For example, to get the additional path information passed to your CGI script, you can use the following method:

my $path = $q->path_info;

This is the same information that you could also get this way:

my $path = $ENV{PATH_INFO};

However, a few methods differ or have features worth noting. Let’s take a look at these.

Accept

As a general rule, if a CGI.pm method has the same name as a built-in Perl function or keyword (e.g., accept or tr), then the CGI.pm method is capitalized. Although there would be no collision if CGI.pm were available only via an object-oriented syntax, the collision creates problem for people who use it via the standard syntax. accept was originally lowercase, but it was renamed to Accept in version 2.44 of CGI.pm, and the new name affects both syntaxes.

Unlike the other methods that take no arguments and simply return a value, Accept can also be given a content type and it will evaluate to true or false depending on whether that content type is acceptable according to the HTTP-Accept header:

if ( $q->Accept( "image/png" ) ) {
    .
    .
    .

Keep in mind that most browsers today send */* in their Accept header. This matches anything, so using the Accept method in this manner is not especially useful. For new file formats like image/png, it is best to get the values for the HTTP header and perform the check yourself, ignoring wildcard matches (this is unfortunate, since it defeats the purpose of wildcards):

my @accept = $q->Accept;
if ( grep $_ eq "image/png", @accept ) {
    .
    .
    .

http

If the http method is called without arguments, it returns the name of the environment variables available that contain an HTTP_ prefix. If you call http with an argument, then it will return the value of the corresponding HTTP_ environment variable. When passing an argument to http, the HTTP_ prefix is optional, capitalization does not matter, and hyphens and underscores are interpreted the same. In other words, you can pass the actual HTTP header field name or the environment variable name or even some hybrid of the two, and http will generally figure it out. Here is how you can display all the HTTP_ environment variables your CGI script receives:

#!/usr/bin/perl -wT

use strict;
use CGI;

my $q = new CGI;
print $q->header( "text/plain" );

print "These are the HTTP environment variables I received:

";

foreach ( $q->http ) {
    print "$_:
";
    print "  ", $q->http( $_ ), "
";
}

https

The https method functions similarly to the http method when it is passed a parameter. It returns the corresponding HTTPS_ environment variable. These variables are set by your web server only if you are receiving a secure request via SSL. When https is called without arguments, it returns the value of the HTTPS environment variable, which indicates whether the connection is secure (its values are server-dependent).

query_string

The query_string method does not do what you might think since it does not correspond one-to-one with $ENV{QUERY_STRING}. $ENV{QUERY_STRING} holds the query portion of the URL that called your CGI script. query_string, on the other hand, is dynamic, so if you modify any of the query parameters in your script (see Section 5.2.2.1 later in this chapter), then the value returned by query_string will include these new values. If you want to know what the original query string was, then you should refer to $ENV{QUERY_STRING} instead.

Also, if the request method is POST, then query_string returns the POST parameters that were submitted in the content of the request, and ignores any parameters passed to the CGI script via the query string. This means that if you create a form that submits its values via POST to a URL that also contains a query string, you will not be able to access the parameters on the query string via CGI.pm unless you make a slight modification to CGI.pm to tell it to include parameters from the original query string with POST requests. We’ll see how to do this in Section 5.2.2.2 later in this chapter.

self_url

This method does not correspond to a standard CGI environment variable, although you could manually construct it from other environment variables. It provides you with a URL that can call your CGI with the same parameters. The path information is maintained and the query string is set to the value of the query_string method.

Note that this URL is not necessarily the same URL that was used to call your CGI script. Your CGI script may have been called because of an internal redirection by the web server. Also, because all of the parameters are moved to the query string, this new URL is built to be used with a GET request, even if the current request was a POST request.

url

The url method functions similarly to the self_url method, except that it returns a URL to the current CGI script without any parameters, i.e., no path information and an empty query string.

virtual_host

The virtual_host method is handy because it returns the value of the HTTP_HOST environment variable, if set, and SERVER_NAME otherwise. Remember that HTTP_HOST is the name of the web server as the browser referred to it, which may differ if multiple domains share the same IP address. HTTP_HOST is available only if the browser supplied the Host HTTP header, added for HTTP 1.1.

Accessing Parameters

param is probably the most useful method CGI.pm provides. It allows you to access the parameters submitted to your CGI script, whether these parameters come to you via a GET request or a POST request. If you call param without arguments, it will return a list of all of the parameter names your script received. If you provide a single argument to it, it will return the value for the parameter with that name. If no parameter with that name was submitted to your script, it returns undef.

It is possible for your CGI script to receive multiple values for a parameter with the same name. This happens when you create two form elements with the same name or you have a select box that allows multiple selections. In this case, param returns a list of all of the values if it is called in a list context and just the first value if it is called in a scalar context. This may sound a little complicated, but in practice it works such that you should end up with what you expect. If you ask param for one value, you will get one value (even if other values were also submitted), and if you ask it for a list, you will always get a list (even if the list contains only one element).

Example 5.1 is a simple example that displays all the parameters your script receives.

Example 5-1. param_list.cgi

#!/usr/bin/perl -wT

use strict;
use CGI;

my $q = new CGI;
print $q->header( "text/plain" );

print "These are the parameters I received:

";

my( $name, $value );

foreach $name ( $q->param ) {
    print "$name:
";
    foreach $value ( $q->param( $name ) ) {
        print "  $value
";
    }
}

If you call this CGI script with multiple parameters, like this:

http://localhost/cgi/param_list.cgi?color=red&color=blue&shade=dark

you will get the following output:

These are the parameters I received:

color:
  red
  blue
shade:
  dark

Modifying parameters

CGI.pm also lets you add, modify, or delete the value of parameters within your script. To add or modify a parameter, just pass param more than one argument. Using Perl’s => operator instead of a comma makes the code easier to read and allows you to omit the quotes around the parameter name, so long as it’s a word (i.e., only contains includes letters, numbers, and underscores) that does not conflict with a built-in function or keyword:

$q->param( title => "Web Developer" );

You can create a parameter with multiple values by passing additional arguments:

$q->param( hobbies => "Biking", "Windsurfing", "Music" );

To delete a parameter, use the delete method and provide the name of the parameter:

$q->delete( "age" );

You can clear all of the parameters with delete_all :

$q->delete_all;

It may seem odd that you would ever want to modify parameters yourself, since these will typically be coming from the user. Setting parameters is useful for many reasons, but especially when assigning default values to fields in forms. We will see how to do this later in this chapter.

POST and the query string

param automatically determines if the request method is POST or GET. If it is POST, it reads any parameters submitted to it from STDIN. If it is GET, it reads them from the query string. It is possible to POST information to a URL that already has a query string. In this case, you have two souces of input data, and because CGI.pm determines what to do by checking the request method, it will ignore the data in the query string.

You can change this behavior if you are willing to edit CGI.pm. In fact, CGI.pm includes comments to help you do this. You can find this block of code in the init subroutine (the line number will vary depending on the version of CGI.pm you have):

if ($meth eq 'POST') {
    $self->read_from_client(*STDIN,$query_string,$content_length,0)
        if $content_length > 0;
    # Some people want to have their cake and eat it too!
    # Uncomment this line to have the contents of the query string
    # APPENDED to the POST data.
    # $query_string .= (length($query_string) ? '&' : '') . $ENV{'QUERY_STRING'}
             if defined $ENV{'QUERY_STRING'};
    last METHOD;
}

By removing the pound sign from the beginning of the line indicated, you will be able to use POST and query string data together. Note that the line you would need to uncomment is too long to display on one line in this text, so it has been wrapped to the next line, but it is just one line in CGI.pm.

Index queries

You may receive a query string that contains words that do not comprise name-value pairs. The <ISINDEX> HTML tag, which is not used much anymore, creates a single text field along with a prompt to enter search keywords. When a user enters words into this field and presses Enter, it makes a new request for the same URL, adding the text the user entered as the query string with keywords separated by a plus sign (+), such as this:

http://www.localhost.com/cgi/lookup.cgi?cgi+perl

You can retrieve the list of keywords that the user entered by calling param with “keywords” as the name of the parameter or by calling the separate keywords method:

my @words = $q->keywords;            # these lines do the same thing
my @words = $q->param( "keywords" );

These methods return index keywords only if CGI.pm finds no name-value pair parameters, so you don’t have to worry about using “keywords” as the name of an element in your HTML forms; it will work correctly. On the other hand, if you want to POST form data to a URL with a keyword, CGI.pm cannot return that keyword to you. You must use $ENV{QUERY_STRING} to get it.

Supporting image buttons as submit buttons

Whether you use <INPUT TYPE="IMAGE” > or <INPUT TYPE="SUBMIT">, the form is still sent to the CGI script. However, with the image button, the name is not transmitted by itself. Instead, the web browser splits an image button name into two separate variables: name.x and name.y.

If you want your program to support image and regular submit buttons interchangeably, it is useful to translate the image button names to normal submit button names. Thus, the main program code can use logic based upon which submit button was clicked even if image buttons later replace them.

To accomplish this, we can use the following code that will set a form variable without the coordinates in the name for each variable that ends in “.x”:

foreach ( $q->param ) {
    $q->param( $1, 1 ) if /(.*).x/;
}

Exporting Parameters to a Namespace

One of the problems with using a method to retrieve the value of a parameter is that it is more work to embed the value in a string. If you wish to print the value of someone’s input, you can use an intermediate variable:

my $name = $q->param( 'user' );
print "Hi, $user!";

Another way to do this is via an odd Perl construct that forces the subroutine to be evaluated as part of an anonymous list:

print "Hi, @{[ $q->param( 'user' ) ]}!";

The first solution is more work and the second can be hard to read. Fortunately, there is a better way. If you know that you are going to need to refer to many output values in a string, you can import all the parameters as variables to a specified namespace:

$q->import_names( "Q" );
print "Hi, $Q::user!";

Parameters with multiple values become arrays in the new namespace, and any characters in a parameter name other than a letter or number become underscores. You must provide a namespace and cannot pass “main”, the default namespace, because that might create security risks.

The price you pay for this convenience is increased memory usage because Perl must create an alias for each parameter.

File Uploads with CGI.pm

As we mentioned in the last chapter, it is possible to create a form with a multipart/form-data media type that permits users to upload files via HTTP. We avoided discussing how to handle this type of input then because handling file uploads properly can be quite complex. Fortunately, there’s no need for us to do this because, like other form input, CGI.pm provides a very simple interface for handling file uploads.

You can access the name of an uploaded file with the param method, just like the value of any other form element. For example, if your CGI script were receiving input from the following HTML form:

<FORM ACTION="/cgi/upload.cgi" METHOD="POST" ENCTYPE="multipart/form-data">
  <P>Please choose a file to upload:
  <INPUT TYPE="FILE" NAME="file">
  <INPUT TYPE="SUBMIT">
</FORM>

then you could get the name of the uploaded file this way, by referring to the name of the <FILE> input element, in this case “file”:

my $file = $q->param( "file" );

The name you receive from this parameter is the name of the file as it appeared on the user’s machine when they uploaded it. CGI.pm stores the file as a temporary file on your system, but the name of this temporary file does not correspond to the name you get from this parameter. We will see how to access the temporary file in a moment.

The name supplied by this parameter varies according to platform and browser. Some systems supply just the name of the uploaded file; others supply the entire path of the file on the user’s machine. Because path delimiters also vary between systems, it can be a challenge determining the name of the file. The following command appears to work for Windows, Macintosh, and Unix-compatible systems:

my( $file ) = $q->param( "file" ) =~ m|([^/:\]+)$|;

However, it may strip parts of filenames, since “report 11/3/99” is a valid filename on Macintosh systems and the above command would in this case set $file to “99”. Another solution is to replace any characters other than letters, digits, underscores, dashes, and periods with underscores and prevent any files from beginning with periods or dashes:

my $file = $q->param( "file" );
$file =~ s/([^w.-])/_/g;
$file =~ s/^[-.]+//;

The problem with this is that Netscape’s browsers on Windows sends the full path to the file as the filename. Thus, $file may be set to something long and ugly like “C_ _ _Windows_Favorites_report.doc”.

You could try to sort out the behaviors of the different operating systems and browsers, check for the user’s browser and operating system, and then treat the filename appropriately, but that would be a very poor solution. You are bound to miss some combinations, you would constantly need to update it, and one of the greatest advantages of the Web is that it works across platforms; you should not build any limitations into your solutions.

So the simple, obvious solution is actually nontechnical. If you do need to know the name of the uploaded file, just add another text field to the form allowing the user to enter the name of the file they are uploading. This has the added advantage of allowing a user to provide a different name than the file has, if appropriate. The HTML form looks like this:

<FORM ACTION="/cgi/upload.cgi" METHOD="POST" ENCTYPE="multipart/form-data">
  <P>Please choose a file to upload:
  <INPUT TYPE="FILE" NAME="file">
  <P>Please enter the name of this file:
  <INPUT TYPE="TEXT" NAME="filename">
</FORM>

You can then get the name from the text field, remembering to strip out any odd characters:

my $filename = $q->param( "filename" );
$filename =~ s/([^w.-])/_/g;
$filename =~ s/^[-.]+//;

So now that we know how to get the name of the file uploaded, let’s look at how we get at the content. CGI.pm creates a temporary file to store the contents of the upload; you can get a file handle for this file by passing the name of the file according to the file element to the upload method as follows:

my $file = $q->param( "file" );
my $fh   = $q->upload( $file );

The upload method was added to CGI.pm in Version 2.47. Prior to this you could use the value returned by param (in this case $file) as a file handle in order to read from the file; if you use it as a string it returns the name of the file. This actually still works, but there are conflicts with strict mode and other problems, so upload is the preferred way to get a file handle now. Be sure that you pass upload the name of the file according to param, and not a different name (e.g., the name the user supplied, the name with nonalphanumeric characters replaced with underscores, etc.).

Note that transfer errors are much more common with file uploads than with other forms of input. If the user presses the Stop button in the browser as the file is uploading, for example, CGI.pm will receive only a portion of the uploaded file. Because of the format of multipart/form-data requests, CGI.pm will recognize that the transfer is incomplete. You can check for errors such as this by using the cgi_error method after creating a CGI.pm object. It returns the HTTP status code and message corresponding to the error, if applicable, or an empty string if no error has occurred. For instance, if the Content-length of a POST request exceeds $CGI::POST_MAX, then cgi_error will return “413 Request entity too large”. As a general rule, you should always check for an error when you are recording input on the server. This includes file uploads and other POST requests. It doesn’t hurt to check for an error with GET requests either.

Example 5.2 provides the complete code, with error checking, to receive a file upload via our previous HTML form.

Example 5-2. upload.cgi

#!/usr/bin/perl -wT

use strict;
use CGI;
use Fcntl qw( :DEFAULT :flock );

use constant UPLOAD_DIR     => "/usr/local/apache/data/uploads";
use constant BUFFER_SIZE    => 16_384;
use constant MAX_FILE_SIZE  => 1_048_576;       # Limit each upload to 1 MB
use constant MAX_DIR_SIZE   => 100 * 1_048_576; # Limit total uploads to 100 MB
use constant MAX_OPEN_TRIES => 100;

$CGI::DISABLE_UPLOADS   = 0;
$CGI::POST_MAX          = MAX_FILE_SIZE;

my $q = new CGI;
$q->cgi_error and error( $q, "Error transferring file: " . $q->cgi_error );

my $file      = $q->param( "file" )     || error( $q, "No file received." );
my $filename  = $q->param( "filename" ) || error( $q, "No filename entered." );
my $fh        = $q->upload( $file );
my $buffer    = "";

if ( dir_size( UPLOAD_DIR ) + $ENV{CONTENT_LENGTH} > MAX_DIR_SIZE ) {
    error( $q, "Upload directory is full." );
}

# Allow letters, digits, periods, underscores, dashes
# Convert anything else to an underscore
$filename =~ s/[^w.-]/_/g;
if ( $filename =~ /^(w[w.-]*)/ ) {
    $filename = $1;
}
else {
    error( $q, "Invalid file name; files must start with a letter or number." );
}

# Open output file, making sure the name is unique
until ( sysopen OUTPUT, UPLOAD_DIR . $filename, O_CREAT | O_EXCL ) {
    $filename =~ s/(d*)(.w+)$/($1||0) + 1 . $2/e;
    $1 >= MAX_OPEN_TRIES and error( $q, "Unable to save your file." );
}

# This is necessary for non-Unix systems; does nothing on Unix
binmode $fh;
binmode OUTPUT;

# Write contents to output file
while ( read( $fh, $buffer, BUFFER_SIZE ) ) {
    print OUTPUT $buffer;
}

close OUTPUT;


sub dir_size {
    my $dir = shift;
    my $dir_size = 0;
    
    # Loop through files and sum the sizes; doesn't descend down subdirs
    opendir DIR, $dir or die "Unable to open $dir: $!";
    while ( readdir DIR ) {
        $dir_size += -s "$dir/$_";
    }
    return $dir_size;
}


sub error {
    my( $q, $reason ) = @_;
    
    print $q->header( "text/html" ),
          $q->start_html( "Error" ),
          $q->h1( "Error" ),
          $q->p( "Your upload was not procesed because the following error ",
                 "occured: " ),
          $q->p( $q->i( $reason ) ),
          $q->end_html;
    exit;
}

We start by creating several constants to configure this script. UPLOAD_DIR is the path to the directory where we will store uploaded files. BUFFER_SIZE is the amount of data to read into memory while transferring from the temporary file to the output file. MAX_FILE_SIZE is the maximum file size we will accept; this is important because we want to limit users from uploading gigabyte-sized files and filling up all of the server’s disk space. MAX_DIR_SIZE is the maximum size that we will allow our upload directory to grow to. This restriction is as important as the last because users can fill up our disks by posting lots of small files just as easily as posting large files. Finally, MAX_OPEN_TRIES is the number of times we try to generate a unique filename and open that file before we give up; we’ll see why this step is necessary in a moment.

First, we enable file uploads, then we set $CGI::POST_MAX to MAX_FILE_SIZE. Note $CGI::POST_MAX is actually the size of the entire content of the request, which includes the data for other form fields as well as overhead for the multipart/form-data encoding, so this value is actually a little larger than the maximum file size that the script will actually accept. For this form, the difference is minor, but if you add a file upload field to a complex form with multiple text fields, then you should keep this distinction in mind.

We then create a CGI object and check for errors. As we said earlier, errors with file uploads are much more common than with other forms of CGI input. Next we get the file’s upload name and the filename the user provided, reporting errors if either of these is missing. Note that a user may be rather upset to get a message saying that the filename is missing after uploading a large file via a modem. There is no way to interrupt that transfer, but in a production application, it might be more user-friendly to save the unnamed file temporarily, prompt the user for a filename, and then rename the file. Of course, you would then need periodically clean up temporary files that were abandoned.

We get a file handle, $fh, to the temporary file where CGI.pm has stored the input. We check whether our upload directory is full and report an error if this is the case. Again, this message is likely to create some unhappy users. In a production application you should add code to notify an administrator who can see why the upload directory is full and resolve the problem. See Chapter 9.

Next, we replace any characters in the filename the user supplied that may cause problems with an underscore and make sure the name doesn’t start with a period or a dash. The odd construct that reassigns the result of the regular expression to $filename untaints that variable. We’ll discuss tainting and why this is important in Chapter 8. We confirm again that $filename is not empty (which would happen if it had consisted of nothing but periods and/or dashes) and generate an error if this is the case.

We try to open a file with this name in our upload directory. If we fail, then we add a digit to $filename and try again. The regular expression allows us to keep the file extension the same: if there is already a report.txt file, then the next upload with that name will be named report1.txt, the next one report2.txt, etc. This continues until we exceed MAX_OPEN_TRIES . It is important that we create a limit to this loop because there may be a reason other than a non-unique name that prevents us from saving the file. If the disk is full or the system has too many open files, for example, we do not want to start looping endlessly. This error should also notify an administrator that something is wrong.

This script is written to handle any type of file upload, including binary files such as images or audio. By default, whenever Perl accesses a file handle on non-Unix systems (more specifically, systems that do not use as their end of line character), Perl translates the native operating system’s end of line characters, such as for Windows or for MacOS, to on input and back to the native characters on output. This works great for text files, but it can corrupt binary files. Thus, we enable binary mode with the binmode function in order to disable this translation. On systems, like Unix, where no end of line translation occurs, binmode has no effect.

Finally, we read from our temporary file handle and write to our output file and exit. We use the read function to read and write a chunk a data at a time. The size of this chunk is defined by our BUFFER_SIZE constant. In case you are wondering, CGI.pm will remove its temporary file automatically when our script exits (technically, when $q goes out of scope).

There is another way we could have moved the file to our uploads directory. We could use CGI.pm’s undocumented tmpFileName method to get the name of the temporary file containing the upload and then used Perl’s rename function to move the file. However, relying on undocumented code is dangerous, because it may not be compatible with future versions of CGI.pm. Thus, in our example we stick to the public API instead.

The dir_size subroutine calculates the size of a directory by summing the size of each of its files. The error subroutine prints a message telling the user why the transfer failed. In a production application, you probably want to provide links for the user to get help or to notify someone about problems.

Generating Output with CGI.pm

CGI.pm provides a very elegant solution for outputting both headers and HTML with Perl. It allows you to embed HTML in your code, but it makes this more natural by turning the HTML into code. Every HTML element can be generated via a corresponding method in CGI.pm. We have already seen some examples of this already, but here’s another:

#!/usr/bin/perl -wT

use strict;
use CGI;

my $q = new CGI;
my $timestamp = localtime;

print $q->header( "text/html" ),
      $q->start_html( -title => "The Time", -bgcolor => "#ffffff" ),
      $q->h2( "Current Time" ),
      $q->hr,
      $q->p( "The current time according to this system is: ",
             $q->b( $timestamp ) ),
      $q->end_html;

The resulting output looks like this (the indentation is added to make it easier to read):

Content-type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
  <HEAD><TITLE>The Time</TITLE></HEAD>
  <BODY BGCOLOR="#ffffff">
    <H2>Current Time</H2>
    <HR>
    <P>The current time according to this system is:
      <B>Mon May 29 16:48:14 2000</B></P>
  </BODY>
</HTML>

As you can see, the code looks a lot like Perl and a lot less like HTML. It is also shorter than the corresponding HTML because CGI.pm manages some common tags for us. Another benefit is that it is impossible to forget to close a tag because the methods automatically generate closing tags (except for those elements that CGI.pm knows do not need them, like <HR>).

We’ll look at all of these output methods in this section, starting with the first method, header.

Controlling HTTP Headers with CGI.pm

CGI.pm has two methods for returning HTTP headers: header and redirect. They correspond to the two ways you can return data from CGI scripts: you can return a document, or you can redirect to another document.

Media type

The header method handles multiple HTTP headers for you. If you pass it one argument, it returns the Content-type header with that value. If you do not supply a media type, it defaults to “text/html”. Although CGI.pm makes outputting HTML much easier, you can of course print any content type with it. Simply use the header method to specify the media type and then print your content, whether it be text, XML, Adobe PDF, etc.:

print $q->header( "text/plain" );
print "This is just some boring text.
";

If you want to set other headers, then you need to pass name-value pairs for each header. Use the -type argument to specify the media type (see the example under Section 5.3.1.2 later in this chapter).

Status

You can specify a status other than “200 OK” by using the -status argument:

print $q->header( -type => "text/html", -status => "404 Not Found" );

Caching

Browsers can’t always tell if content is being dynamically generated by CGI or if it is coming from a static source, and they may try to cache the output of your script. You can disable this or request caching if you want it, by using the -expires argument. You can supply either a full time stamp with this argument or a relative time. Relative times are created by supplying a plus or minus sign for forward or backward, an integer number, and a one letter abbreviation for second, minute, hour, day, month, or year (each of these abbreviations is lowercase except for month, which is an uppercase M). You can also use “now” to indicate that a document should expire immediately. Specifying a negative value also has this effect.

This example tells the browser that this document is good for the next 30 minutes:

print $q->header( -type => "text/html", -expires => "+30m" );

Specifying an alternative target

If you are using frames or have multiple windows, you may want links in one document to update another document. You can use the -target argument along with the name of the other document (as set by a <FRAMESET> tag or by JavaScript) to specify that clicking on a link in this document should cause the new resource to load in the other frame (or window):

print $q->header( -type => "text/html", -target => "main_frame" );

This argument is only meaningful for HTML documents.

Redirection

If you need to redirect to another URL, you can use the redirect method instead of printing the Location HTTP header:

print $q->redirect( "http://localhost/survey/thanks.html" );

Although the term “redirect” is an action, this method does not perform a redirect for you; it simply returns the corresponding header. So don’t forget you still need to print the result!

Other headers

If you need to generate other HTTP headers, you can simply pass the name-value pair to header and it will return the header with the appropriate formatting. Underscores are converted to hyphens for you.

Thus, the following statement:

print $q->header( -content_encoding  => "gzip" );

produces the following output:

Content-encoding: gzip

Starting and Ending Documents

Now let’s look at the methods that you can use to generate HTML. We’ll start by looking at the methods for starting and ending documents.

start_html

The start_html method returns the HTML DTD, the <HTML> tag, the <HEAD> section including <TITLE>, and the <BODY> tag. In the previous example, it generates HTML like the following:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML><HEAD><TITLE>The Time</TITLE>
</HEAD><BODY BGCOLOR="#ffffff">

The most common arguments start_html recognizes are as follows:

  • Setting the -base argument to a true value tells CGI.pm to include a <BASE HREF="url"> tag in the head of your document that points to the URL of your script.

  • The -meta argument accepts a reference to a hash containing the name and content of meta tags that appear in the head of your document.

  • The -script argument allows you to add JavaScript to the head of your document. You can either provide a string containing the JavaScript code or a reference to a hash containing -language, -src, and -code as possible keys. This allows you to specify the language and source attributes of the <SCRIPT> tag too. CGI.pm automatically provides comment tags around the code to protect it from browsers that do not recognize JavaScript.

  • The -noscript argument allows you to specify HTML display if the browser does not support JavaScript. It is inserted into the head of your document.

  • The -style argument allows you to define a style sheet for the document. Like -script, you may either specify a string or a reference to a hash. The keys that -style accepts in the hash are -code and -src. The value of -code will be inserted into the document as style sheet information. The value of -src will be a URL to a .css file. CGI.pm automatically provides comment tags around the code to protect cascading style sheets from browsers that do not recognize them.

  • The -title argument sets the title of the HTML document.

  • The -xbase argument lets you specify a URL to use in the <BASE HREF="url"> tag. This is different from the -base argument that also generates this tag but sets it to the URL of the current CGI script.

Any other arguments, like -bgcolor, are passed as attributes to the <BODY> tag.

end_html

The end_html method returns the </BODY> and </HTML> tags.

Standard HTML Elements

HTML elements can be generated by using the lowercase name of the element as a method, with the following exceptions: Accept, Delete, Link, Param, Select, Sub, and Tr. These methods have an initial cap to avoid conflicting with built-in Perl functions and other CGI.pm methods.

The following rules apply to basic HTML tags:

  • CGI.pm recognizes that some elements, like <HR> and <BR>, do not have closing tags. These methods take no arguments and return the single tag:

    print $q->hr;

    This outputs:

    <HR>
  • If you provide one argument, it creates an opening and closing tag to enclose the text of your argument. Tags are capitalized:

    print $q->p( "This is a paragraph." );

    This prints the text:

    <P>This is a paragraph.</P>
  • If you provide multiple arguments, these are simply joined with the tags at the beginning and the end:

    print $q->p( "The server name is:", $q->server_name );

    This prints the text:

                         <P>The server name is: localhost</P>

    This usage makes it easy to nest elements:

    print $q->p( "The server name is:", $q->em( $q->server_name ) );

    This prints the text:

    <P>The server name is: <EM>localhost</EM></P>

    Note that a space is automatically added between each list element. It appears after the colon in these examples. If you wish to print multiple items in a list without intervening spaces, then you must set Perl’s list separator variable, $", to an empty string:

    { 
      local $" = "";
      print $q->p( "Server=", $q->server_name );
    }

    This prints the text:

    <P>Server=Apache/1.3.9</P>

    Note that whenever you change global variables like $", you should localize them by enclosing them in blocks and using Perl’s local function.

  • If the first argument is a reference to a hash, then the hash elements are interpreted as attributes for the HTML element:

    print $q->a( { -href => "/downloads" }, "Download Area" );

    This prints the text:

    <A HREF="/downloads" >Download Area</A>

    You can specify as many attributes as you want. The leading hyphen as part of the attribute name is not required, but it is the standard convention.

    Some attributes do not take arguments and simply appear as a word. For these, pass undef as the value of the attribute. Prior to version 2.41 of CGI.pm, passing an empty string would accomplish the same thing, but that was changed so that people could explicitly request an attribute set to an empty string (e.g., <IMG HREF="spacer.gif” ALT="">).

  • If you provide a reference to an array as an argument, the tag is distributed across each item in the array:

    print $q->ol( $q->li( [ "First", "Second", "Third" ] ) );

    This corresponds to:

    <OL>
      <LI>First</LI>
      <LI>Second</LI>
      <LI>Third</LI>
    </OL>

    This still works fine when the first argument is a reference to a hash arguments. Here is a table:

    print $q->table(
                     { -border => 1,
                       -width  => "100%" },
                     $q->Tr( [
                               $q->th( { -bgcolor => "#cccccc" },
                                       [ "Name", "Age" ] ),
                               $q->td( [ "Mary", 29 ] ),
                               $q->td( [ "Bill", 27 ] ),
                               $q->td( [ "Sue",  26 ] )
                           ] )
                   );

    This corresponds to:

    <TABLE BORDER="1" WIDTH="100%">
      <TR>
        <TH BGCOLOR="#cccccc">Name</TH>
        <TH BGCOLOR="#cccccc">Age</TH>
      </TR>
      <TR>
        <TD>Mary</TD>
        <TD>29</TD>
      </TR>
      <TR>
        <TD>Bill</TD>
        <TD>27</TD>
      </TR>
      <TR>
        <TD>Sue</TD>
        <TD>26</TD>
      </TR>
    </TABLE>
  • Aside from the spaces we mentioned above that are introduced between array elements, CGI.pm does not insert any whitespace between HTML elements. It creates no indentation and inserts no new lines. Although this makes it harder for a human to read, it also makes the output smaller and downloads faster. If you wish to generate neatly formatted HTML code, you can use the CGI::Pretty module distributed with CGI.pm. It provides all of the features of CGI.pm (because it is an object-oriented module that extends CGI.pm), but the HTML it produces is neatly indented.

Form Elements

The syntax for generating form elements differs from other elements. These methods only take name-value pairs that correspond to the attributes. See Table 5.2.

Table 5-2. CGI.pm Methods for HTML Form Elements

CGI.pm Method

HTML Tag

start_form

<FORM>

end_form

</FORM>

textfield

<INPUT TYPE="TEXT” >

password_field

<INPUT TYPE="PASSWORD” >

filefield

<INPUT TYPE="FILE” >

button

<INPUT TYPE="BUTTON” >

submit

<INPUT TYPE="SUBMIT” >

reset

<INPUT TYPE="RESET” >

checkbox, checkbox_group

<INPUT TYPE="CHECKBOX” >

radio_group

<INPUT TYPE="RADIO” >

popup_menu

<SELECT SIZE="1” >

scrolling_list

<SELECT SIZE="n” > where n > 1

textarea

<TEXTAREA>

hidden

<INPUT TYPE="HIDDEN” >

The start_form and end_form elements generate the opening and closing form tags. start_form takes arguments for each of its attributes:

print $q->start_form( method => "get", action => "/cgi/myscript.cgi" );

Note that unlike a typical form tag, CGI.pm sets the request method to POST instead of GET by default (the reverse of the default for HTML forms). If you want to allow file uploads, use the start_multipart_form method instead of start_form, which sets enctype to “multipart/form-data”.

All of the remaining methods create form elements. They all take the -name and -default arguments. The -default value for an element is replaced by the corresponding value from param if that value exists. You can disable this and force the default to override a user’s parameters by passing the -override argument with a true value.

The -default option specifies the default value of the element for elements with single values:

print $q->textfield(
        -name    => "username",
        -default => "Anonymous"
      );

This yields:

<INPUT TYPE="text" NAME="username" VALUE="Anonymous">

By supplying an array with the -values argument, the checkbox_group and radio_group methods generate multiple checkboxes that share the same name. Likewise, passing an array reference with the -values argument to the scrolling_list and popup_menu functions generates both the <SELECT> and <OPTION> elements. For these elements, -default indicates the values that are checked or selected; you can pass -default a reference to an array for checkbox_group and scrolling_list for multiple defaults.

Each method accepts a -labels argument that takes a reference to a hash; this hash associates the value of each element to the label the browser displays to the user.

Here is how you can generate a group of radio buttons:

print $q->radio_group(
        -name    => "curtain",
        -values  => [ "A", "B", "C" ],
        -default => "B",
        -labels  => { A => "Curtain A", B => "Curtain B", C => "Curtain C" }
      );

This yields:

<INPUT TYPE="radio" NAME="look_behind" VALUE="A">Curtain A
<INPUT TYPE="radio" NAME="look_behind" VALUE="B" CHECKED>Curtain B
<INPUT TYPE="radio" NAME="look_behind" VALUE="C">Curtain C

For specifying any other attributes for form elements, like SIZE=4, pass them as additional arguments (e.g., size => 4).

Alternatives for Generating Output

There are many different ways that people output HTML from their CGI scripts. We have just looked at how you do this from CGI.pm, and in the next chapter we will look at how we can use HTML templates to keep the HTML separate from the code. However, let’s look here at a couple of other techniques developers use to output HTML from their scripts.

One thing to keep in mind as we look at these techniques is how difficult the HTML is to maintain. Over the lifetime of a CGI application, it is often the HTML that changes the most. Thus much of the maintenance of the application will involve making changes to the design or wording found in the HTML, so the HTML should be easy to edit.

Lots of print Statements

The simplest solution for including HTML in the source code is the hardest to maintain. Many web developers start out writing CGI scripts that contain numerous print statements to return documents, even for large sections of static content—content that remains the same each time the CGI script is called.

Here is an example:

#!/usr/bin/perl -wT

use strict;

my $timestamp = localtime;

print "Content-type: text/html

";
print "<html>
";
print "<head>
";
print "<title>The Time</title>
";
print "</head>
";

print "<body bgcolor="#ffffff">
";
print "<h2>Current Time</h2>
";
print "<hr>
";
print "<p>The current time according to this system is: 
";
print "<b>$timestamp</b>
";
print "</p>
";
print "</body>
";
print "</html>
";

This is a pretty basic example, but you could imagine just how complicated this can get on a large web page with numerous graphics, nested tables, style declarations, etc. Not only is this difficult to read because of the extra noise that each print statement adds, but each double quote in the HTML must be escaped with a backslash. If you forget to do this even once, you will likely generate a syntax error. Making HTML edits to something that looks like this is much more work than it should be. You should definitely avoid this approach in your scripts.

Here Documents

As we have seen in earlier examples, Perl supports a feature called here documents that allows you to express a large block of content separately within your code. To create a here document, simply use << followed by the token that will be used to indicate the end of the here document. You can include the token in single or double quotes, and the content will be evaluated as if it were a string within those quotes. In other words, if you use single quotes, variables will not be interpreted. If you omit the quotes, it acts as though you had used double quotes.

Here is the previous example using a here document instead:

#!/usr/bin/perl -wT

use strict;

my $timestamp = localtime;

print <<END_OF_MESSAGE;
Content-type: text/html

<html>
  <head>
    <title>The Time</title>
  </head>
  
  <body bgcolor="#ffffff">
    <h2>Current Time</h2>
    <hr>
    <p>The current time according to this system is: 
    <b>$timestamp</b></p>
  </body>
</html>
END_OF_MESSAGE

This is much cleaner than using lots of print statements, and it allows us to indent the HTML content. The result is that this is much easier to read and to update. You could have accomplished something similar by using one print statement and putting all the content inside one pair of double quotes, but then you would have had to precede each double quote in the HTML with a backslash, and for complicated HTML documents this could get tedious.

Another solution is to use Perl’s qq// operator, but with a different delimiter, such as ~. You must find a delimiter that will not appear in the HTML, and remember that if your content includes JavaScript, it can include many characters that HTML might otherwise not. here documents are generally a safer solution.

One drawback to using here documents is that they do not easily indent, so they may look odd inside blocks of otherwise cleanly indented code. Tom Christiansen and Nathan Torkington address this issue in the Perl Cookbook (O’Reilly & Associates, Inc.). The following solutions are adapted from their discussion.

If you do not care about extra leading whitespace in your HTML output, you can simply indent everything. You can also indent the ending token if you use quotes and include the indent in the name (although this is more readable, it may be less maintainable because if the indentation changes, then you must adjust the name of the token to match):

#!/usr/bin/perl -wT

use strict;

my $timestamp = localtime;
display_document( $timestamp );

sub display_document {
    my $timestamp = shift;
    
    print <<"    END_OF_MESSAGE";
      Content-type: text/html
      
      <html>
        <head>
          <title>The Time</title>
        </head>
        
        <body bgcolor="#ffffff">
          <h2>Current Time</h2>
          <hr>
          <p>The current time according to this system is: 
          <b>$timestamp</b></p>
        </body>
      </html>
    END_OF_MESSAGE
}

One problem with indenting HTML here documents is that the extra indentation is sent to the client. You can solve this problem by creating a function that “unindents” your text. If you wish to remove all indentation, this is simple; if you want to maintain your HTML’s indentation, this is more complex. The challenge is determining the amount of indentation to remove: what portion belongs to the content and what part is incidental to your script? You could assume the first line contains the smallest indent, but this would not work if you were only printing the end of an HTML document, for example, when the last line would probably contain the smallest indent.

In the following code the unindent subroutine looks at all of the lines being printed, finds the smallest indent, and removes that amount from all of the lines:

sub unindent;

sub display_document {
    my $timestamp = shift;
    
    print unindent <<"    END_OF_MESSAGE";
      Content-type: text/html
      
      <html>
        <head>
          <title>The Time</title>
        </head>
        
        <body bgcolor="#ffffff">
          <h2>Current Time</h2>
          <hr>
          <p>The current time according to this system is: 
          <b>$timestamp</b></p>
        </body>
      </html>
    END_OF_MESSAGE
}

sub unindent {
    local $_ = shift;
    my( $indent ) = sort /^([ 	]*)S/gm;
    s/^$indent//gm;
    return $_;
}

Predeclaring the unindent function, as we do on the first line, allows us to omit parentheses when we use it. This solution, of course, increases the amount of work the server must do for each request, so it would not be appropriate on a heavily used server. Also keep in mind that each additional space increases the number of bytes you must transfer and the user must download, so you may actually want to strip all leading whitespace instead. After all, users probably care more about the page downloading faster than how it looks if they view the source code.

Overall, here documents are not a bad solution for large chunks of code, but they do not offer CGI.pm’s advantages, especially the ability to have your HTML code verified syntactically. It’s much harder to forget to close an HTML tag with CGI.pm than it is with a here document. Also, many times you must build HTML programmatically. For example, you may read records from a database and add a row to a table for each record. In these cases, when you are working with small chunks of HTML, CGI.pm is much easier to work with than here documents.

Using CGI.pm’s methods for outputting HTML generates strong reactions in developers. Some love it; others don’t. Don’t worry if it doesn’t match your needs, we will look at a whole class of alternatives in the next chapter.

Handling Errors

While we are on the subject of handling output, we should also look at handling errors. One of the things that distinguishes an experienced developer from a novice is adequate error handling. Novices expect things to always work as planned; experienced developers have learned otherwise.

Dying Gracefully

The most common method that Perl developers use for handling errors is Perl’s built-in die function. Here is an example:

open FILE, $filename or die "Cannot open $filename: $!";

If Perl is unable to open the file specified by $filename, die will print an error message to STDERR and terminate the script. The open function, like most Perl commands that interact with the system, sets $! to the reason for the error if it fails.

Unfortunately, die is not always the best solution for handling errors in your CGI scripts. As you will recall from Chapter 3, output to STDERR is typically sent to the web server’s error log, triggering the web server to return a 500 Internal Server Error . This is certainly not a very user-friendly response.

You should determine a policy for handling errors on your site. You may decide that 500 Internal Server Error pages are acceptable for very uncommon system errors like the inability to read or write to files. However, you may decide that you wish to display a formatted HTML page instead with information for users such as alternative actions they can take or who to notify about the problem.

Trapping die

It is possible to trap die so that it does not generate a 500 Internal Server Error automatically. This is especially useful because many common third-party modules use die (and variants such as croak) as their manner for responding to errors. If you know that a particular subroutine may call die, you can catch this with an eval block in Perl:

eval {
    dangerous_routine(  );
    1;
} or do {
    error( $q, $@ || "Unknown error" );
};

If dangerous_routine does call die, then eval will catch it, set the special variable $@ to the value of the die message, pass control to the end of the block, and return undef. This allows us to call another subroutine to display our error more gracefully. Note that an eval block will not trap exit.

This works, but it certainly makes your code a lot more complex, and if your CGI script interacts with a lot of subroutines that might die, then you must either place your entire script within an eval block or include lots of these blocks throughout your script.

Fortunately, there is a better way. You may already know that it is possible to create a global signal handler to trap Perl’s die and warn functions. This involves some rather advanced Perl; you can find specific information in Programming Perl. Fortunately, we don’t have to worry about the specifics, because there is a module that not only does this, but is written specifically for CGI scripts: CGI::Carp.

CGI::Carp

CGI::Carp is not part of the CGI.pm module, but it is also by Lincoln Stein, and it is distributed with CGI.pm (and thus included with the most recent versions of Perl). It does two things: it creates more informative entries in your error log, and it allows you to create a custom error page for fatal calls like die. Simply by using the module, it adds a timestamp and the name of the running CGI script to errors written to the error log by die , warn, carp, croak, and confess. The last three functions are provided by the Carp module (included with Perl) and are often used by module authors.

This still does not stop your web server from displaying 500 Internal Server Error responses for these calls, however. CGI::Carp is most useful when you ask it to trap fatal calls. You can have it display fatal error messages in the browser instead. This is especially helpful during development and debugging. To do this, simply pass the fatalsToBrowser parameter to it when you use the module:

use CGI::Carp qw( fatalsToBrowser );

In a production environment, you may not want users to view your full error information if they encounter an error. Fortunately, you can have CGI::Carp trap errors and display your own custom error message. To do this, you pass CGI::Carp::set_message a reference to a subroutine that takes a single argument and displays the content of a response.

use CGI::Carp qw( fatalsToBrowser );
BEGIN {
    sub carp_error {
        my $error_message = shift;
        my $q = new CGI;
          $q->start_html( "Error" ),
          $q->h1( "Error" ),
          $q->p( "Sorry, the following error has occurred: " );
          $q->p( $q->i( $error_message ) ),
          $q->end_html;
    }
    CGI::Carp::set_message( &carp_error );
}

We will see how to incorporate this into a more general solution later in Example 5.3.

Error Subroutines

Most of our examples up to now and throughout the book include subroutines or blocks of code for displaying errors. Here is an example:

sub error {
    my( $q, $error_message ) = shift;
    
    print $q->header( "text/html" ),
          $q->start_html( "Error" ),
          $q->h1( "Error" ),
          $q->p( "Sorry, the following error has occurred: " );
          $q->p( $q->i( $error_message ) ),
          $q->end_html;
    exit;
}

You can call this with a CGI object and a reason for the error. It will output an error page and then exit in order to stop executing your script. Note that we print the HTTP header here. One of the biggest challenges in creating a general solution for catching errors is knowing whether or not to print an HTTP header: if one has already been printed and you print another, it will appear at the top of your error page; if one has not been printed and you do not print one as part of the error message, you will trigger a 500 Internal Server Error instead.

Fortunately, CGI.pm has a feature that will track whether a header has been printed for you already. If you enable this feature, it will only output an HTTP header once per CGI object. Any future calls to header will silently do nothing. You can enable this feature in one of three ways:

  1. You can pass the -unique_headers flag when you load CGI.pm:

    use CGI qw( -unique_headers );
  2. You can set the $CGI::HEADERS_ONCE variable to a true value after you use CGI.pm, but before you create an object:

    use CGI;
    $CGI::HEADERS_ONCE = 1;
    
    my $q = new CGI;
  3. Finally, if you know that you always want this feature, you can enable it globally for all of your scripts by setting $HEADERS_ONCE to a true value within your copy of CGI.pm. You can do this just like $POST_MAX and $DISABLE_UPLOADS variables we discussed at the beginning of the chapter. You will find $HEADERS_ONCE is in the same configurable section of CGI.pm:

    # Change this to 1 to suppress redundant HTTP headers
    $HEADERS_ONCE = 0;

Although adding subroutines to each of your CGI scripts is certainly an acceptable way to catch errors, it’s still not a very general solution. You will probably want to create your own error pages that are customized for your site. Once you start including complex HTML in your subroutines, it will quickly become too difficult to maintain them. If you build error subroutines that output error pages according to your site’s template, and then later someone decides they want to change the site’s look, you must go back and update all of your subroutines. Clearly, a much better option is to create a general error handler that all of your CGI scripts can access.

Custom Module

It is a good idea to create your own Perl module that’s specific to your site. If you host different sites, or have different applications within your site with different looks and feels, you may wish to create a module for each. Within this module, you can place subroutines that you find yourself using across many CGI scripts. These subroutines will vary depending on your site, but one should handle errors.

If you have not created your own Perl module before, don’t worry, it’s quite simple. Example 5.3 shows a very minimal module.

Example 5-3. CGIBook::Error.pm

#!/usr/bin/perl -wT

package CGIBook::Error;

# Export the error subroutine
use Exporter;
@ISA = "Exporter";
@EXPORT = qw( error );

$VERSION = "0.01";

use strict;
use CGI;
use CGI::Carp qw( fatalsToBrowser );

BEGIN {
    sub carp_error {
        my $error_message = shift;
        my $q = new CGI;
        my $discard_this = $q->header( "text/html" );
        error( $q, $error_message );
    }
    CGI::Carp::set_message( &carp_error );
}

sub error {
    my( $q, $error_message ) = @_;
    
    print $q->header( "text/html" ),
          $q->start_html( "Error" ),
          $q->h1( "Error" ),
          $q->p( "Sorry, the following error has occurred: " ),
          $q->p( $q->i( $error_message ) ),
          $q->end_html;
    exit;
}

1;

The only difference between a Perl module and a standard Perl script is that you should save your file with a .pm extension, declare the name of module’s package with the package function (this should match the file’s name except without the .pm extension and substituting :: for /),[7] and make sure that it returns a true value when evaluated (the reason for the 1; at the bottom).

It is standard practice to store the version of the module in $VERSION. For the sake of convenience, we also use the Exporter module to export the error subroutine. This allows us to refer to it in our scripts as error instead of CGIBook::Exporter::error. Refer to the Exporter manpage or a primary Perl text, such as Programming Perl, for details on using Exporter.

You have a couple options for saving this file. The simplest solution is to save it within the site_perl directory of your Perl libraries, such as /usr/lib/perl5/site_perl/5.005/CGIBook/Error.pm. The site_perl directory includes modules that are site-specific (i.e., not included in Perl’s standard distribution). The paths of your Perl libraries may differ; you can locate them on your system with the following command:

$ perl -e 'print map "$_
", @INC'

You probably want to create a subdirectory that is unique to your organization, as we did with CGIBook, to hold all the Perl modules you create.

You can use the module as follows:

#!/usr/bin/perl -wT

use strict;
use CGI;
use CGIBook::Error;

my $q = new CGI;

unless ( check_something_important(  ) ) {
    error( $q, "Something bad happened." );
}

If you do not have the permission to install the module in your Perl library directory, and if you cannot get your system administrator to do it, then you can place the module in another location, for example, /usr/local/apache/perl-lib/CGIBook/Error.pm. Then you must remember to include this directory in the list that Perl searches for modules. The simplest way to do this is with the lib pragma:

#!/usr/bin/perl -wT

use strict;
use lib "/usr/local/apache/perl-lib";

use CGI;
use CGIBook::Error;
.
.
.


[6] These are not necessarily bugs in CGI.pm; CGI.pm strives to maintain compatibility with new servers and browsers that sometimes include buggy, or at least nonstandard, code.

[7] When determining the package name, the file’s name should be relative to a library path in @INC. In our example, we store the file at /usr/lib/perl5/site_perl/5.005/CGIBook/Error.pm. /usr/lib/perl5/site_perl/5.005 is a library directory. Thus, the path to the module relative to the library directory is CGIBook/Error.pm so the package is CGIBook::Error.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.165.247