Chapter 15. Debugging CGI Applications

So far, we’ve discussed numerous CGI applications, ranging from the trivial to the very complex, but we haven’t touched upon the techniques needed to debug them if something goes wrong. Debugging a CGI application is not much different than debugging any other type of application, because, after all, code is code. However, since a CGI application is run by a remote user across the network in a special environment created by the web server, it is sometimes difficult to pinpoint the problems.

This chapter is all about debugging CGI applications. First, we’ll examine some of the common errors that developers generally come across when implementing CGI applications. These include incorrect server configuration, permission problems, and violations of the HTTP protocol. Then, we’ll explore a few tips, tricks, and tools that will help us track down problems and develop better applications.

Common Errors

This section can serve as a checklist that you can use to diagnose common problems. Here is a list of common sources of errors:

Source of Problem

Typical Error Message

Application permissions

403 Forbidden

The pound-bang line

403 Forbidden

Line endings

500 Internal Server Error

“Malformed” header

500 Internal Server Error

Let’s look at each of these in more detail.

Application Permissions

Typically, web servers are configured to run as nobody or another user with minimal access privileges. This is a great preventative step, and one that can possibly salvage your data in the case of an attack. Since the web server process does not have privileges to write to, read from, or execute files in directories that don’t have "world” access, most of your data will stay intact.

However, this also create a few problems for us. First and foremost, we need to set the world execute bit on the CGI applications, so the server can execute them. Here’s how you can check the permissions of your applications:

$ ls -l /usr/local/apache/cgi-bin/clock
-rwx------  1 shishir      3624 Oct 17 17:59 clock

The first field lists the permissions for the file. This field is divided into three parts: the privileges for the owner, the group, and the world (from left to right), with the first letter indicating the type of the file: either a regular file, or a directory. In this example, the owner has sole permission to read, write, and execute the program.

If you want the server to be able to execute this application, you have to issue the following command:

$ chmod 711 clock
-rwx--x--x  1 shishir      3624 Oct 17 17:59 clock*

The chmod command (change mode) modifies the permissions for the file. The octal code of 711 indicates read (octal 4), write (octal 2), and execute (octal 1) permissions for the owner, and execute permissions for everyone else.

That’s not the end of our permission woes. We could run into other problems dealing with file permissions, most notably, the inability to create or update files. We will discuss this in Section 15.2 later in this chapter.

Despite configuring the server to recognize CGI applications and setting the execute permissions, our applications can still fail to execute, as you’ll see next.

The Pound-Bang

If a CGI application is written in Perl, Python, Tcl, or another interpreted scripting language, then it must have a line at the very top that begins with a pound-bang, or #!, like this:

#!/usr/bin/perl -wT

We’ve seen this above every script throughout this book. When the web server recognizes a request for a CGI application, it calls the exec system function to execute the application. If the application is a compiled executable, the operating system will go ahead and execute it. However, if our application is a script of some sort, then the operating system will look at the first line to see what interpreter to use.

If your scripts are missing the pound-bang line, or if the path you specify is invalid, then you will get an error. On some systems, for example, perl is found at /usr/bin/perl, while on others it is found at /usr/local/bin/perl. On Unix systems, you can use either of the following commands to locate perl (depending on your shell):

$ which perl
$ whence perl

If neither of these commands work, then look for perl5 instead of perl. If you still cannot locate perl, then try either of the following commands. They return anything on your filesystem named perl, so they could return multiple results, and the find command will search your entire filesystem, so depending on the size of the filesystem, this could take a while:

$ locate perl
$ find / -name perl -type f -print 2>/dev/null

Another thing to keep in mind: if you have multiple interpreters (i.e., different versions) for the same language, make sure that your scripts reference the one you intend, or else you may see some mysterious effects. For example, on some systems, perl4 is still installed in addition to perl5. Test the path you use with the -v flag to get its version.

Line Endings

If you are working with a CGI script that downloaded from another site or edited with a different platform, then it is possible that the line endings do not match those of the current system. For example, perl on Unix will complain with multiple syntax errors if you attempt to run a file that is formatted for Windows. You can clean these files up with perl from the command line:

$ perl -pi -e 's/
/
/' calendar.cgi

“Malformed” Header

As we first discussed in Chapter 2, and Chapter 3, and have seen in all the examples since, all CGI applications must return a valid HTTP content-type header, followed by a newline, before the actual content, like this:

Content-type: text/html
(other headers)

(content)

If you fail to follow this format, then a typical 500 Server Error will ensue. The partial solution is to return all necessary HTML headers, including content type, as early on in the CGI application as possible. We will look at a very useful technique in the next section that will help us with this task.

However, there are other reasons why we may see such an error. If your CGI application generates errors that are printed to STDERR, these error messages may be returned to the web server before all of the header information. Because Perl buffers output to STDOUT, errors that occur after you have printed the headers may even cause this problem.

What’s the moral? Make sure you check your application from the command line before you try to execute it from the Web. If you’re using Perl to develop CGI applications, then you can use the -wcT switch to check for syntax errors:

$ perl -wcT clock.cgi
syntax error in file clock.cgi at line 9, at EOF
clock.cgi had compilation errors.

If there are warnings, but no errors, you may see the following:

$ perl -wcT clock.cgi
Name "main::opt_g" used only once: possible typo at clock.cgi line 5.
Name "main::opt_u" used only once: possible typo at clock.cgi line 6.
Name "main::opt_f" used only once: possible typo at clock.cgi line 7.
clock.cgi syntax OK

Pay attention to the warnings, as well. Perl’s syntax checker has really improved over the years, and will alert you of many possible errors, such as using non existent variables, uninitialized variables, or file handles.

And finally, if there are no warnings or errors, you will see:

$ perl -wcT clock.cgi
clock.cgi syntax OK

To reiterate, make sure your application works from the command line before you even attempt to debug its functionality from the Web.

Perl Coding Techniques

In this section, we’ll discuss programming techniques that will help us develop stable, bug-free applications. These techniques are easy to use, and using them can help you avoid bugs in the first place:

  • Always use strict.

  • Check the status of system calls.

  • Verify that each file open is successful.

  • Trap die.

  • Lock files.

  • Unbuffer the output stream when necessary.

  • Use binmode when necessary.

Let’s review each of these in detail.

Use strict

You should use the strict pragma for any Perl script more than a few lines long, and for all CGI scripts. Simply place the following line at the top of your script:

use strict;

If an import list is not specified, strict generates errors if you use symbolic references, bareword identifiers as subroutines, or use variables that are not localized, fully qualified, or pre-defined using the vars argument.

Here are two snippets of code, one which will compile successfully under strict, and the other which will cause errors:

use strict;

$id = 2000;
$field = $id;
print $$field;        ## Success, will print 2000

$field = "id";
print $$field;        ## Error!

Symbolic references are names of variables, used to get at the underlying object. In the second snippet above, we are trying to get at the value of $id indirectly. As a result, Perl will generate an error like the following:

Can't use string ("id") as a SCALAR ref while "strict refs" in use ...

Now, let’s look at bareword subroutines. Take the following example:

use strict "subs";
greeting;
...
sub greeting
{
    print "Hello Friend!";
}

When Perl looks at the second line, it doesn’t know what it is. It could be a string in a void context or it could be a subroutine or function call. When we run this code, Perl will generate the following error:

Bareword "greeting" not allowed while "strict subs" in use at simple line 3.
Execution of simple aborted due to compilation errors.

We can solve this in one of several ways. We can create a prototype, declare greeting as a subroutine with the subs module, use the & prefix, or pass an empty list, like so:

sub greeting;              ## prototype
use subs qw (greeting);    ## use subs

&greeting;                 ## & prefix
greeting(  );                ## null list

This forces us to be clear about the use of subroutines in our applications.

The last restriction that strict imposes on us involves variable declaration. You have probably run across source code where you’re not sure if a certain variable is global, or local to a function or subroutine. By using the vars argument with strict, we can eliminate this guessing.

Here’s a trivial example:

use strict "vars";
$soda = "Coke";

Since we haven’t told Perl what $soda is, it will complain with the following error:

Global symbol "$soda" requires explicit package name at simple line 3.
Execution of simple aborted due to compilation errors.

We can solve this problem by using a fully qualified variable name, declaring the variable using the vars module, or localizing it with my, like so:

$main::soda = "Coke";    ## Fully qualified
use vars qw ($soda);     ## Declare using vars module
my $soda;                ## Localize

As you can see, the strict module imposes a very rigid environment for developing applications. But, that’s a very nice and powerful feature, because it helps us track down a variety of bugs. In addition, the module allows for great flexibility as well. For example, if we know that a certain piece of code works fine, but will fail under strict, we can turn certain restrictions off, like so:

## code that passes strict
...
{
    no strict;    ## or no strict "vars";
    
    ## code that will not pass strict
}

All code within the block, delimited by braces, will have no restrictions.

With this type of flexibility and control, there is no reason why you should not be using strict to help you develop cleaner, bug-free applications.

Check Status of System Calls

Before we discuss anything in this section, here’s a mantra to code by:

“Always check the return value of all the system commands, including open, eval, and system.”

Since web servers are typically configured to run as nobody, or a user with minimal access privileges, we must be very careful when performing any file or system I/O. Take, for example, the following code:

#!/usr/bin/perl -wT

print "Content-type: text/html

";
...
open FILE, "/usr/local/apache/data/recipes.txt";

while (<FILE>) {
    s/^s*$/<P>/, next if (/^s*$/);
    s/
/<BR>/;
    ...
}

close FILE;

If the /usr/local/apache/data directory is not world readable, then the open command will fail, and we will end up with no output. This isn’t really desirable, since the user will have no idea what happened.

A solution to this problem is to check the status of open:

...
open FILE, "/usr/local/apache/data/recipes.txt"
    or error ( $q, "Sorry, I can't access the recipe data!" );

print "Content-type: text/html

";
...

If the open fails, we call a custom error function to return a nicely formatted HTML document and exit.

You need to follow the same process when creating or updating files, as well. In order for a CGI application to write to a file, it has to have write permissions on the file, as well as the directories in which the file resides.

Some of the more commonly used system functions include: open, close, flock, eval, and system. You should make it a habit to check the return value of such functions, so you can take preventative action.

Is It Open?

In various examples throughout the book, we’ve used the open function to create pipes to execute external applications and perform data redirection. Unfortunately, unlike in the previous section, there is no easy way to determine if an application is executed successfully within the pipe.

Here’s a simple example that sorts some numerical data.

open FILE, "| /usr/local/gnu/sort"
    or die "Could not create pipe: $!";

print "Content-type: text/plain

";

## fill the @data array with some numerical data
...

print FILE join ("
", @data);
close FILE;

If we cannot create the pipe, which is almost never the case, we return an error. But, what if the path to the sort command is incorrect? Then, the user will not see any error, nor any reasonable output.

So, how do we determine if the sort command executes successfully? Unfortunately, due to the way the shell operates, the status of the command is available only after the file handle is closed.

Here’s an example:

open FILE, "| /usr/local/gnu/sort"
    or die "Could not create pipe: $!";

### code ommitted for brevity
...

close FILE;

my $status = ($? >> 8);

if ( $status ) {
    print "Sorry! I cannot access the data at this time!";
}

Once the file handle is closed, Perl stores the actual return status in the $? variable. We determine the true status (i.e., or 1) by right shifting the actual status by eight bits.

There is also another, albeit less portable and reliable, method to determine the status of the pipe. This involves checking the PID of the child process, spawned by the open function:

#!/usr/bin/perl -wT

use strict;
use CGI;

my $q = new CGI;

my $pid    = open FILE, "| /usr/local/gnu/sort";
my $status = kill 0, $pid;

$status or die "Cannot open pipe to sort: $!";

## We're successful!
print $q->header( "text/plain" );
...

We use the kill function to send a signal of zero to the process created by the pipe. If the process is dead, which means the application within the pipe never got executed, the operating system returns a value of zero. As mentioned before, this technique is not 100% reliable, and will not work on all Unix platforms, but it’s something you might want to try.

Trap die

Don’t forget about our earlier discussion about die. If your code or a module that you call invokes Perl’s die function, it will certainly trigger a 500 Internal Server Error unless you trap it. Use CGI::Carp to trap fatal calls and redirect the messages to the browser. Add this line to the top of your script:

use CGI::Carp qw( fatalsToBrowser );

Refer to Section 5.5 for more on CGI::Carp.

File Locking

If you find that you are losing data in your data files, or files are becoming corrupt, then you are probably not locking them. The Web is a multi-user environment and multiple users may access the same document or CGI application at the same time. Let’s take a look at an example that doesn’t perform any locking:

#!/usr/bin/perl -wT

use CGI;
use CGIBook::Error;

my $cgi      = new CGI;  
my $email    = $cgi->param ("email")    || "Anonymous";
my $comments = $cgi->param ("comments") || "No comments";
...
open FILE, ">>/usr/local/apache/data/guestbook.txt"
    or error( $q, "Cannot add your entry to guestbook!");

print FILE "From $email: $comments

";
close FILE;

print "Location: /generic/thanks.html

";

Now, imagine a scenario where multiple users, say 100, access this application at the exact same time. What happens? A hundred CGI application processes all will try to write to the guestbook.txt file, and more than likely, we’ll end up with data loss and corruption.

In order to solve the problem, we need to lock the file. Refer to Section 10.1.1 for more details.

Unbuffer Output Stream

Sometimes, you may run into what seems like a very strange error where output doesn’t appear in the order in which it is sent to standard output stream. This typically occurs when you call an external application to generate output.

For example, the following example might not work properly on all systems:

#!/usr/bin/perl -wT

print "Content-type: text/plain

";
system "/bin/finger";

In what seems like a very bizarre error, the output from system can actually appear before the content type header. This is the result of buffering the standard output stream.

You can turn buffering off, like so:

$| = 1;

This forces Perl to flush the standard output stream buffers after every write.

Use binmode

On operating systems that distinguish between binary and text files, most notably Windows 95, NT, and the Macintosh, we have to be very careful, especially when returning binary output. For example, the following application creates a simple dynamic image:

#!/usr/bin/perl -wT

use GD;
use strict;

my $image = new GD::Image( 100, 100 );

my $white = $image->colorAllocate( 255, 255, 255 );
my $black = $image->colorAllocate( 0, 0, 0 );
my $red   = $image->colorAllocate( 255, 0, 0 );

$image->arc( 50, 50, 95, 75, 0, 360, $black );
$image->fill( 50, 50, $red );

print "Content-type: image/png

";
print $image->png;

However, the output will result in a broken image if we run the application on a platform mentioned above. The solution is to use the binmode function to treat the resulting output as binary information:

## code omitted for brevity
...
binmode STDOUT;
print $image->png;
               
               
               
               

Debugging Tools

We’ve looked at what can cause common errors, but not everything is a common problem. If you are having problems and none of the earlier solutions helps, then you need to do some investigative work. In this section, we’ll look at some tools to help you uncover the source of the problem. Briefly, here is an outline of the steps you can take:

  • Check the syntax of your scripts with the -c flag.

  • Check the web server’s error logs.

  • Run your script from the command line.

  • Test the value of variables by dumping them to the browser.

  • Use an interactive debugger.

Let’s review each in more detail.

Check Syntax

We mentioned this within one of the sections above, but it bears repeating again in its own section: if your code does not parse or compile, then it will never run correctly. So get in the habit of testing your scripts with the -c flag from the command line before you test them in the browser, and while you’re add it, have it check for warnings too with the -w flag. Remember, if you use taint mode (and you are using taint mode with all of your scripts, right?), you also need to pass the -T flag to avoid the following error:

$ perl -wc myScript.cgi
Too late for "-T" option.

Therefore, use the -wcT combination:

perl -wcT calendar.cgi

This will either return:

Syntax OK

or a list of problems. Of course you should only use the -c flag from the command line and not add it to the pound-bang line in your scripts.

Check Error Logs

Typically, errors are printed to STDERR, and on some web servers anything that is printed to STDERR while a CGI script is running ends up in your server’s error logs. Thus, you can often find all sorts of useful clues by scanning these logs when you have problems. Possible locations of this file with Apache are /usr/local/apache/logs/error_log or /usr/var/logs/httpd/error_log. Errors are appended to the bottom; you may want to watch the log as you test your CGI script. If you use the tail command with a -f option:

$ tail -f /usr/local/apache/logs/error_log

it will print new lines as they are written to the file.

Running Scripts from the Command Line

Once your scripts pass a syntax check, the next step is to try to run them from the command line. Remember that because CGI scripts receive much of their data from environment variables, you can set these manually yourself before you run your script:

$ export HTTP_COOKIE="user_id=abc123"
$ export QUERY_STRING="month=jan&year=2001"
$ export REQUEST_METHOD="GET"
$ ./calendar.cgi

You will see the full output of your script including any headers you print. This can be quite useful if you suspect your problem has to do with the headers you are sending.

If you are using Version 2.56 or previous of CGI.pm, it makes accepting form parameters much easier, by prompting for them when you run your script:

(offline mode: enter name=value pairs on standard input)

You can then enter parameters as name-value pairs separated by an equals sign. CGI.pm ignores whitespace and allows you to use quotes:

(offline mode: enter name=value pairs on standard input)
month = jan
year=2001

When you are finished, press the end-of-file character on your system (use Ctrl-D on Unix or Mac; use Ctrl-Z on Windows).

As of 2.57, CGI.pm no longer automatically prompts for values. Instead, you can pass parameters as arguments to your script (this works for previous versions, too):

$ ./calendar.cgi month=jan year=2001

If you prefer to have CGI.pm prompt you for input instead, you can still enable this in later versions by using the -debug argument with CGI.pm:

use CGI qw( -debug );

If you are working with a complex form, and it is too much work to manually enter parameters, then you can capture the parameters to a file to use offline by adding a few lines to the top of your script:

#!/usr/bin/perl -wT

use strict;
use CGI;

my $q = new CGI;

## BEGIN INSERTED CODE
open FILE, "> /tmp/query1" or die $!;
$q->save( *FILE );
print $q->header( "text/plain" ), "File saved
";
## END INSERTED CODE
.
.

Now you should have a file saved to /tmp/query1 which you can use from the command line. Remove the inserted code first (or comment it out for future use), then you can use the query file like this:

$ ./catalog.cgi < /tmp/query1

Dumping Variables

If you script runs correctly but it does not do what you expect, then you need to break it down into chunks to determine where it is failing. The simplest way to do this is to include a handful of print statements:

sub fetch_results {
print "Entering fetch_results( @_ )
"; #DEBUG#
    .
    .

You may want to outdent these commands or place comments at the end so that it is easy to find and remove them when you are done.

If you are working with a complex Perl data structure, you can print it quite easily by using the Data::Dumper module. Simply add code like the following:

.
    .
use Data::Dumper;        #DEBUG#
print Dumper( $result ); #DEBUG#
    return $result;
}

The Dumper function will serialize your data structure into neatly indented Perl source code. If you want to look at this within an HTML page, be sure to enclose it within <PRE> tags or view the page source.

If you are outputting complex HTML, you may need to view the source in order to see whether your statements printed. It is often much easier to open a separate filehandle to your own log file and print your debugging commands there. In fact, you may want to develop your own module that provides a way to send debugging output to a common debug log file as well as a simple way to turn debugging mode on and off.

Debuggers

All the previous strategies help isolate bugs, but the best solution by far is to use debuggers. Debuggers allow you to interact with your program as it runs. You can monitor the program flow, watch the value of variables, and more.

The Perl debugger

If you invoke perl with the -d flag, you will end up in an interactive session. Unfortunately, this means that you can use the debugger only from the command line. This is not the traditional environment for CGI scripts, but it is not difficult to mimic the CGI environment, as we saw earlier. The best way to do this is to save a CGI object to a query file, initialize any additional environment variables you might need, such as cookies, and then run your CGI script like this:

$perl -dT calendar.cgi </tmp/query1

Loading DB routines from perl5db.pl version 1
Emacs support available.

Enter h or `h h' for help.

main::(Dev:Pseudo:7):	my $q = new CGI;
  DB<1>

The debugger can be intimidating at first, but it is very powerful. To help you get going, Table 15.1 shows a brief summary of all the basic commands you need to know to debug a script. You can debug all of your CGI scripts with just these commands, although there are many more features actually available. Practice walking through scripts that you know work in order to learn how to move around within the debugger. The debugger will not change your files, so you cannot damage a working script by typing a wrong command.

Complete documentation for the Perl debugger is available in the perldebug manpage, and a quick reference for the complete set of commands is available by typing h within the debugger.

Table 15-1. Basic Perl Debugger Commands

Command

Description

s

Step; Perl executes the line listed above the prompt, stepping into any subroutines; note that a line with multiple commands may take a few steps to evaluate.

n

Next; Perl executes the line listed above the prompt, stepping over any subroutines (they still run; Perl waits for them to finish before continuing).

c

Continue to the end of the program or the next break point, whichever comes first.

c 123

Continue up to line 123; line 123 must contain a command (it cannot be a comment, blank line, the second half of a command, etc.).

b

Set a breakpoint at current line; breakpoints halt execution caused by c.

b 123

Set a breakpoint at line 123; line 123 must contain a command (it cannot be a comment, blank line, the second half of a command, etc.).

b my_sub

Set a breakpoint at the first executable line of the my_sub sub.

d

Delete a breakpoint from the current line; takes same arguments as b.

D

Deletes all breakpoints.

x $var

Display the value of $var in list and scalar contexts; note that it will recurse down complex, nested data structures.

r

Return from the current sub; Perl finishes executing the current subroutine, displays the result, and continues at the next line after the sub.

l

List the next 10 lines of your script; this command can be used successively.

l 123

List line 123 of your script.

l 200-300

List lines 200 through 300 of your script.

l my_sub

List the first 10 lines of the my_sub sub.

q

Quit.

R

Restart the script in the debugger.

ptkdb

Another option is ptkdb (see Figure 15.1), the Perl/Tk debugger, which is available on CPAN as Devel-ptkdb. It allows you to debug your scripts with a graphical interface. It also allows you to debug your CGI interactively as they are running.

Debugging a CGI script with ptkdb

Figure 15-1. Debugging a CGI script with ptkdb

In order to use ptkdb, you need two things. First, you need access to an X Window server;[21] the X Window System is included with most Unix and compatible systems; commercial versions are available for other operating systems as well. Second, the web server must have Tk.pm module, available on CPAN, which requires Tk. Tk is a graphics toolkit that is typically distributed with the Tcl scripting language. You can obtain Tcl/Tk from http://www.scriptics.com/. For more information on using Perl with Tk via Tk.pm, refer to Learning Perl/Tk by Nancy Walsh (O’Reilly & Associates, Inc.).

In order to debug a CGI script with ptkdb, begin your CGI scripts as follows:

#! /usr/bin/perl -d:ptkdb

sub BEGIN {
    $ENV{DISPLAY} = "your.machine.hostname:0.0" ;
}

You should replace your.machine.hostname with the hostname or IP address of your machine. You can use localhost if you are running an X Window session on the web server.

You also need to allow the web server to display programs on your X Window server. On Unix and compatible systems, you do so by adding the registering the hostname or IP address of the webserver with the xhost command:

$ xhost www.webserver.hostname
www.webserver.hostname being added to access control list

You can then access your CGI script via a browser, which should open a debugging window on your system. Note that your web browser may time out if you spend much time interacting with the debugger without your script producing output.

ActiveState Perl debugger

The final option is available only to Win32 users. ActiveState distributes a graphical Perl debugger with their Perl Development Kit (PDK), shown in Figure 15.2.

Debugging a CGI script with the ActiveState Perl debugger

Figure 15-2. Debugging a CGI script with the ActiveState Perl debugger

Once installed, using the -d flag with perl invokes this debugger instead of the standard Perl debugger. It can also be invoked when running CGI scripts if you are logged into the web server.

You can obtain the PDK and corresponding documentation from ActiveState’s web site at http://www.activestate.com/. The PDK is a commercial product, but as of the time this book was written, ActiveState offers a free seven-day trial.



[21] In the X Window System, you run an X Window server locally, which displays programs that you may execute remotely. The use of “server” in this context is sometimes confusing, since you typically use a client to interact with remote systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.85.142