So far, we’ve discussed numerous CGI applications, ranging from the trivial to the very complex, but we haven’t touched upon the techniques needed to debug them if something goes wrong. Debugging a CGI application is not much different than debugging any other type of application, because, after all, code is code. However, since a CGI application is run by a remote user across the network in a special environment created by the web server, it is sometimes difficult to pinpoint the problems.
This chapter is all about debugging CGI applications. First, we’ll examine some of the common errors that developers generally come across when implementing CGI applications. These include incorrect server configuration, permission problems, and violations of the HTTP protocol. Then, we’ll explore a few tips, tricks, and tools that will help us track down problems and develop better applications.
This section can serve as a checklist that you can use to diagnose common problems. Here is a list of common sources of errors:
Source of Problem |
Typical Error Message |
---|---|
Application permissions |
403 Forbidden |
The pound-bang line |
403 Forbidden |
Line endings |
500 Internal Server Error |
“Malformed” header |
500 Internal Server Error |
Let’s look at each of these in more detail.
Typically, web servers are configured to run as nobody or another user with minimal access privileges. This is a great preventative step, and one that can possibly salvage your data in the case of an attack. Since the web server process does not have privileges to write to, read from, or execute files in directories that don’t have "world” access, most of your data will stay intact.
However, this also create a few problems for us. First and foremost, we need to set the world execute bit on the CGI applications, so the server can execute them. Here’s how you can check the permissions of your applications:
$ ls -l /usr/local/apache/cgi-bin/clock -rwx------ 1 shishir 3624 Oct 17 17:59 clock
The first field lists the permissions for the file. This field is divided into three parts: the privileges for the owner, the group, and the world (from left to right), with the first letter indicating the type of the file: either a regular file, or a directory. In this example, the owner has sole permission to read, write, and execute the program.
If you want the server to be able to execute this application, you have to issue the following command:
$ chmod 711 clock -rwx--x--x 1 shishir 3624 Oct 17 17:59 clock*
The chmod command (change mode) modifies the permissions for the file. The octal code of 711 indicates read (octal 4), write (octal 2), and execute (octal 1) permissions for the owner, and execute permissions for everyone else.
That’s not the end of our permission woes. We could run into other problems dealing with file permissions, most notably, the inability to create or update files. We will discuss this in Section 15.2 later in this chapter.
Despite configuring the server to recognize CGI applications and setting the execute permissions, our applications can still fail to execute, as you’ll see next.
If a
CGI application is written in Perl,
Python, Tcl, or another
interpreted scripting language, then
it must have a line at the very top that begins with a pound-bang, or
#!
, like this:
#!/usr/bin/perl -wT
We’ve seen this above every script throughout this book. When the web server recognizes a request for a CGI application, it calls the exec system function to execute the application. If the application is a compiled executable, the operating system will go ahead and execute it. However, if our application is a script of some sort, then the operating system will look at the first line to see what interpreter to use.
If your scripts are missing the pound-bang line, or if the path you specify is invalid, then you will get an error. On some systems, for example, perl is found at /usr/bin/perl, while on others it is found at /usr/local/bin/perl. On Unix systems, you can use either of the following commands to locate perl (depending on your shell):
$ which perl $ whence perl
If neither of these commands work, then look for perl5 instead of perl. If you still cannot locate perl, then try either of the following commands. They return anything on your filesystem named perl, so they could return multiple results, and the find command will search your entire filesystem, so depending on the size of the filesystem, this could take a while:
$ locate perl $ find / -name perl -type f -print 2>/dev/null
Another thing to keep in mind: if you have
multiple interpreters
(i.e., different versions) for the same language, make sure that your
scripts reference the one you intend, or else you may see some
mysterious effects. For example, on some systems,
perl4 is still installed in addition to
perl5. Test the path you use with the
-v flag to get its
version.
If you are working with a CGI script that downloaded from another site or edited with a different platform, then it is possible that the line endings do not match those of the current system. For example, perl on Unix will complain with multiple syntax errors if you attempt to run a file that is formatted for Windows. You can clean these files up with perl from the command line:
$ perl -pi -e 's/ / /' calendar.cgi
As we first discussed in Chapter 2, and Chapter 3, and have seen in all the examples since, all CGI applications must return a valid HTTP content-type header, followed by a newline, before the actual content, like this:
Content-type: text/html (other headers) (content)
If you fail to follow this format, then a typical 500 Server Error will ensue. The partial solution is to return all necessary HTML headers, including content type, as early on in the CGI application as possible. We will look at a very useful technique in the next section that will help us with this task.
However, there are other reasons why we may see such an error. If your CGI application generates errors that are printed to STDERR, these error messages may be returned to the web server before all of the header information. Because Perl buffers output to STDOUT, errors that occur after you have printed the headers may even cause this problem.
What’s the moral? Make sure you check your application from the command line before you try to execute it from the Web. If you’re using Perl to develop CGI applications, then you can use the -wcT switch to check for syntax errors:
$ perl -wcT clock.cgi syntax error in file clock.cgi at line 9, at EOF clock.cgi had compilation errors.
If there are warnings, but no errors, you may see the following:
$ perl -wcT clock.cgi Name "main::opt_g" used only once: possible typo at clock.cgi line 5. Name "main::opt_u" used only once: possible typo at clock.cgi line 6. Name "main::opt_f" used only once: possible typo at clock.cgi line 7. clock.cgi syntax OK
Pay attention to the warnings, as well. Perl’s syntax checker has really improved over the years, and will alert you of many possible errors, such as using non existent variables, uninitialized variables, or file handles.
And finally, if there are no warnings or errors, you will see:
$ perl -wcT clock.cgi clock.cgi syntax OK
To reiterate, make sure your application works from the command line before you even attempt to debug its functionality from the Web.
In this section, we’ll discuss programming techniques that will help us develop stable, bug-free applications. These techniques are easy to use, and using them can help you avoid bugs in the first place:
Always use
strict
.
Check the status of system calls.
Verify that each file open
is successful.
Trap die.
Lock files.
Unbuffer the output stream when necessary.
Use binmode when necessary.
Let’s review each of these in detail.
You should use the strict
pragma for any Perl script more than a few lines long, and for all
CGI scripts. Simply place the following line at the top of your
script:
use strict;
If an import list is not specified, strict
generates errors if you use symbolic references, bareword identifiers
as subroutines, or use variables that are not localized, fully
qualified, or pre-defined using the vars
argument.
Here are two snippets of code, one which will compile successfully
under strict
, and the other which will cause
errors:
use strict; $id = 2000; $field = $id; print $$field; ## Success, will print 2000 $field = "id"; print $$field; ## Error!
Symbolic references are names of
variables, used to get at the underlying object. In the second
snippet above, we are trying to get at the value of
$id
indirectly. As a result, Perl will generate an
error like the following:
Can't use string ("id") as a SCALAR ref while "strict refs" in use ...
Now, let’s look at bareword subroutines. Take the following example:
use strict "subs"; greeting; ... sub greeting { print "Hello Friend!"; }
When Perl looks at the second line, it doesn’t know what it is. It could be a string in a void context or it could be a subroutine or function call. When we run this code, Perl will generate the following error:
Bareword "greeting" not allowed while "strict subs" in use at simple line 3. Execution of simple aborted due to compilation errors.
We can solve this in one of several ways. We can create a prototype,
declare greeting as a subroutine with the
subs
module, use the &
prefix, or pass an
empty list, like so:
sub greeting; ## prototype use subs qw (greeting); ## use subs &greeting; ## & prefix greeting( ); ## null list
This forces us to be clear about the use of subroutines in our applications.
The last restriction that strict
imposes on us
involves
variable declaration. You have
probably run across source code where you’re not sure if a
certain variable is
global, or local to a function or
subroutine. By using the vars
argument with
strict
, we can eliminate this guessing.
Here’s a trivial example:
use strict "vars"; $soda = "Coke";
Since we haven’t told Perl what $soda
is, it
will complain with the following error:
Global symbol "$soda" requires explicit package name at simple line 3. Execution of simple aborted due to compilation errors.
We can solve this problem by using a
fully qualified variable name,
declaring the variable using the vars
module, or
localizing it with my, like so:
$main::soda = "Coke"; ## Fully qualified use vars qw ($soda); ## Declare using vars module my $soda; ## Localize
As you can see, the strict
module imposes a very
rigid environment for developing applications. But, that’s a
very nice and powerful feature, because it helps us track down a
variety of bugs. In addition, the module allows for great flexibility
as well. For example, if we know that a certain piece of code works
fine, but will fail under strict
, we can turn
certain
restrictions off, like so:
## code that passes strict ... { no strict; ## or no strict "vars"; ## code that will not pass strict }
All code within the block, delimited by braces, will have no restrictions.
With this type of flexibility and control, there is no reason why you
should not be using strict
to help you develop
cleaner, bug-free
applications.
Before we discuss anything in this section, here’s a mantra to code by:
“Always check the return value of all the system
commands,
including open , eval , and system .” |
Since web servers are typically configured to run as nobody, or a user with minimal access privileges, we must be very careful when performing any file or system I/O. Take, for example, the following code:
#!/usr/bin/perl -wT print "Content-type: text/html "; ... open FILE, "/usr/local/apache/data/recipes.txt"; while (<FILE>) { s/^s*$/<P>/, next if (/^s*$/); s/ /<BR>/; ... } close FILE;
If the /usr/local/apache/data directory is not world readable, then the open command will fail, and we will end up with no output. This isn’t really desirable, since the user will have no idea what happened.
A solution to this problem is to check the status of open:
... open FILE, "/usr/local/apache/data/recipes.txt" or error ( $q, "Sorry, I can't access the recipe data!" ); print "Content-type: text/html "; ...
If the open fails, we call a custom error function to return a nicely formatted HTML document and exit.
You need to follow the same process when creating or updating files, as well. In order for a CGI application to write to a file, it has to have write permissions on the file, as well as the directories in which the file resides.
Some of the more commonly used system functions include: open, close, flock, eval, and system. You should make it a habit to check the return value of such functions, so you can take preventative action.
In various examples throughout the book, we’ve used the open function to create pipes to execute external applications and perform data redirection. Unfortunately, unlike in the previous section, there is no easy way to determine if an application is executed successfully within the pipe.
Here’s a simple example that sorts some numerical data.
open FILE, "| /usr/local/gnu/sort" or die "Could not create pipe: $!"; print "Content-type: text/plain "; ## fill the @data array with some numerical data ... print FILE join (" ", @data); close FILE;
If we cannot create the pipe, which is almost never the case, we return an error. But, what if the path to the sort command is incorrect? Then, the user will not see any error, nor any reasonable output.
So, how do we determine if the sort command executes successfully? Unfortunately, due to the way the shell operates, the status of the command is available only after the file handle is closed.
Here’s an example:
open FILE, "| /usr/local/gnu/sort" or die "Could not create pipe: $!"; ### code ommitted for brevity ... close FILE; my $status = ($? >> 8); if ( $status ) { print "Sorry! I cannot access the data at this time!"; }
Once the file handle is closed, Perl stores
the actual return status in the $?
variable. We
determine the true status (i.e.,
or 1) by right shifting the actual status by eight bits.
There is also another, albeit less portable and reliable, method to determine the status of the pipe. This involves checking the PID of the child process, spawned by the open function:
#!/usr/bin/perl -wT use strict; use CGI; my $q = new CGI; my $pid = open FILE, "| /usr/local/gnu/sort"; my $status = kill 0, $pid; $status or die "Cannot open pipe to sort: $!"; ## We're successful! print $q->header( "text/plain" ); ...
We use the kill function to send a signal of zero to the process created by the pipe. If the process is dead, which means the application within the pipe never got executed, the operating system returns a value of zero. As mentioned before, this technique is not 100% reliable, and will not work on all Unix platforms, but it’s something you might want to try.
Don’t forget about our earlier discussion about die. If your code or a module that you call invokes Perl’s die function, it will certainly trigger a 500 Internal Server Error unless you trap it. Use CGI::Carp to trap fatal calls and redirect the messages to the browser. Add this line to the top of your script:
use CGI::Carp qw( fatalsToBrowser );
Refer to Section 5.5 for more on CGI::Carp.
If you find that you are losing data in your data files, or files are becoming corrupt, then you are probably not locking them. The Web is a multi-user environment and multiple users may access the same document or CGI application at the same time. Let’s take a look at an example that doesn’t perform any locking:
#!/usr/bin/perl -wT use CGI; use CGIBook::Error; my $cgi = new CGI; my $email = $cgi->param ("email") || "Anonymous"; my $comments = $cgi->param ("comments") || "No comments"; ... open FILE, ">>/usr/local/apache/data/guestbook.txt" or error( $q, "Cannot add your entry to guestbook!"); print FILE "From $email: $comments "; close FILE; print "Location: /generic/thanks.html ";
Now, imagine a scenario where multiple users, say 100, access this application at the exact same time. What happens? A hundred CGI application processes all will try to write to the guestbook.txt file, and more than likely, we’ll end up with data loss and corruption.
In order to solve the problem, we need to lock the file. Refer to Section 10.1.1 for more details.
Sometimes, you may run into what seems like a very strange error where output doesn’t appear in the order in which it is sent to standard output stream. This typically occurs when you call an external application to generate output.
For example, the following example might not work properly on all systems:
#!/usr/bin/perl -wT print "Content-type: text/plain "; system "/bin/finger";
In what seems like a very bizarre error, the output from system can actually appear before the content type header. This is the result of buffering the standard output stream.
You can turn buffering off, like so:
$| = 1;
This forces Perl to flush the standard output stream buffers after every write.
On operating systems that distinguish between binary and text files, most notably Windows 95, NT, and the Macintosh, we have to be very careful, especially when returning binary output. For example, the following application creates a simple dynamic image:
#!/usr/bin/perl -wT use GD; use strict; my $image = new GD::Image( 100, 100 ); my $white = $image->colorAllocate( 255, 255, 255 ); my $black = $image->colorAllocate( 0, 0, 0 ); my $red = $image->colorAllocate( 255, 0, 0 ); $image->arc( 50, 50, 95, 75, 0, 360, $black ); $image->fill( 50, 50, $red ); print "Content-type: image/png "; print $image->png;
However, the output will result in a broken image if we run the application on a platform mentioned above. The solution is to use the binmode function to treat the resulting output as binary information:
## code omitted for brevity ... binmode STDOUT; print $image->png;
We’ve looked at what can cause common errors, but not everything is a common problem. If you are having problems and none of the earlier solutions helps, then you need to do some investigative work. In this section, we’ll look at some tools to help you uncover the source of the problem. Briefly, here is an outline of the steps you can take:
Let’s review each in more detail.
We mentioned this within one of the sections above, but it bears repeating again in its own section: if your code does not parse or compile, then it will never run correctly. So get in the habit of testing your scripts with the -c flag from the command line before you test them in the browser, and while you’re add it, have it check for warnings too with the -w flag. Remember, if you use taint mode (and you are using taint mode with all of your scripts, right?), you also need to pass the -T flag to avoid the following error:
$ perl -wc myScript.cgi Too late for "-T" option.
Therefore, use the -wcT
combination:
perl -wcT calendar.cgi
This will either return:
Syntax OK
or a list of problems. Of course you should only use the -c flag from the command line and not add it to the pound-bang line in your scripts.
Typically, errors are printed to STDERR, and on some web servers anything that is printed to STDERR while a CGI script is running ends up in your server’s error logs. Thus, you can often find all sorts of useful clues by scanning these logs when you have problems. Possible locations of this file with Apache are /usr/local/apache/logs/error_log or /usr/var/logs/httpd/error_log. Errors are appended to the bottom; you may want to watch the log as you test your CGI script. If you use the tail command with a -f option:
$ tail -f /usr/local/apache/logs/error_log
it will print new lines as they are written to the file.
Once your scripts pass a syntax check, the next step is to try to run them from the command line. Remember that because CGI scripts receive much of their data from environment variables, you can set these manually yourself before you run your script:
$ export HTTP_COOKIE="user_id=abc123" $ export QUERY_STRING="month=jan&year=2001" $ export REQUEST_METHOD="GET" $ ./calendar.cgi
You will see the full output of your script including any headers you print. This can be quite useful if you suspect your problem has to do with the headers you are sending.
If you are using Version 2.56 or previous of CGI.pm, it makes accepting form parameters much easier, by prompting for them when you run your script:
(offline mode: enter name=value pairs on standard input)
You can then enter parameters as name-value pairs separated by an equals sign. CGI.pm ignores whitespace and allows you to use quotes:
(offline mode: enter name=value pairs on standard input) month = jan year=2001
When you are finished, press the end-of-file character on your system (use Ctrl-D on Unix or Mac; use Ctrl-Z on Windows).
As of 2.57, CGI.pm no longer automatically prompts for values. Instead, you can pass parameters as arguments to your script (this works for previous versions, too):
$ ./calendar.cgi month=jan year=2001
If you prefer to have CGI.pm prompt you for input instead, you can
still enable this in later versions by using the
-debug
argument with CGI.pm:
use CGI qw( -debug );
If you are working with a complex form, and it is too much work to manually enter parameters, then you can capture the parameters to a file to use offline by adding a few lines to the top of your script:
#!/usr/bin/perl -wT use strict; use CGI; my $q = new CGI; ## BEGIN INSERTED CODE open FILE, "> /tmp/query1" or die $!; $q->save( *FILE ); print $q->header( "text/plain" ), "File saved "; ## END INSERTED CODE . .
Now you should have a file saved to /tmp/query1 which you can use from the command line. Remove the inserted code first (or comment it out for future use), then you can use the query file like this:
$ ./catalog.cgi < /tmp/query1
If you script runs
correctly but it does not
do what you expect, then you need to break it down into chunks to
determine where it is failing. The simplest way to do this is to
include a handful of
print
statements:
sub fetch_results { print "Entering fetch_results( @_ ) "; #DEBUG# . .
You may want to outdent these commands or place comments at the end so that it is easy to find and remove them when you are done.
If you are working with a complex Perl data structure, you can print it quite easily by using the Data::Dumper module. Simply add code like the following:
. . use Data::Dumper; #DEBUG# print Dumper( $result ); #DEBUG# return $result; }
The Dumper
function will serialize your data
structure into neatly indented Perl source code. If you want to look
at this within an
HTML
page, be sure to enclose it within <PRE> tags or view the page
source.
If you are outputting complex HTML, you may need to view the source in order to see whether your statements printed. It is often much easier to open a separate filehandle to your own log file and print your debugging commands there. In fact, you may want to develop your own module that provides a way to send debugging output to a common debug log file as well as a simple way to turn debugging mode on and off.
All the previous strategies help isolate bugs, but the best solution by far is to use debuggers. Debuggers allow you to interact with your program as it runs. You can monitor the program flow, watch the value of variables, and more.
If you invoke perl with the -d flag, you will end up in an interactive session. Unfortunately, this means that you can use the debugger only from the command line. This is not the traditional environment for CGI scripts, but it is not difficult to mimic the CGI environment, as we saw earlier. The best way to do this is to save a CGI object to a query file, initialize any additional environment variables you might need, such as cookies, and then run your CGI script like this:
$perl -dT calendar.cgi </tmp/query1 Loading DB routines from perl5db.pl version 1 Emacs support available. Enter h or `h h' for help. main::(Dev:Pseudo:7): my $q = new CGI; DB<1>
The debugger can be intimidating at first, but it is very powerful. To help you get going, Table 15.1 shows a brief summary of all the basic commands you need to know to debug a script. You can debug all of your CGI scripts with just these commands, although there are many more features actually available. Practice walking through scripts that you know work in order to learn how to move around within the debugger. The debugger will not change your files, so you cannot damage a working script by typing a wrong command.
Complete documentation for the Perl debugger is available in the
perldebug manpage, and a quick reference for
the complete set of commands is available by typing
h
within the debugger.
Table 15-1. Basic Perl Debugger Commands
Command |
Description |
---|---|
|
Step; Perl executes the line listed above the prompt, stepping into any subroutines; note that a line with multiple commands may take a few steps to evaluate. |
|
Next; Perl executes the line listed above the prompt, stepping over any subroutines (they still run; Perl waits for them to finish before continuing). |
|
Continue to the end of the program or the next break point, whichever comes first. |
|
Continue up to line 123; line 123 must contain a command (it cannot be a comment, blank line, the second half of a command, etc.). |
|
Set a breakpoint at current line; breakpoints halt execution caused
by |
|
Set a breakpoint at line 123; line 123 must contain a command (it cannot be a comment, blank line, the second half of a command, etc.). |
|
Set a breakpoint at the first executable line of the
|
|
Delete a breakpoint from the current line; takes same arguments as
|
|
Deletes all breakpoints. |
|
Display the value of |
|
Return from the current sub; Perl finishes executing the current subroutine, displays the result, and continues at the next line after the sub. |
|
List the next 10 lines of your script; this command can be used successively. |
|
List line 123 of your script. |
|
List lines 200 through 300 of your script. |
|
List the first 10 lines of the |
|
Quit. |
|
Restart the script in the debugger. |
Another option is ptkdb (see Figure 15.1), the Perl/Tk debugger, which is available on CPAN as Devel-ptkdb. It allows you to debug your scripts with a graphical interface. It also allows you to debug your CGI interactively as they are running.
In order to use ptkdb, you need two things. First, you need access to an X Window server;[21] the X Window System is included with most Unix and compatible systems; commercial versions are available for other operating systems as well. Second, the web server must have Tk.pm module, available on CPAN, which requires Tk. Tk is a graphics toolkit that is typically distributed with the Tcl scripting language. You can obtain Tcl/Tk from http://www.scriptics.com/. For more information on using Perl with Tk via Tk.pm, refer to Learning Perl/Tk by Nancy Walsh (O’Reilly & Associates, Inc.).
In order to debug a CGI script with ptkdb, begin your CGI scripts as follows:
#! /usr/bin/perl -d:ptkdb sub BEGIN { $ENV{DISPLAY} = "your.machine.hostname:0.0" ; }
You should replace your.machine.hostname
with the
hostname or IP address of your machine. You can use
localhost
if you are running an X Window session
on the web server.
You also need to allow the web server to display programs on your X Window server. On Unix and compatible systems, you do so by adding the registering the hostname or IP address of the webserver with the xhost command:
$ xhost www.webserver.hostname www.webserver.hostname being added to access control list
You can then access your CGI script via a browser, which should open a debugging window on your system. Note that your web browser may time out if you spend much time interacting with the debugger without your script producing output.
The final option is available only to Win32 users. ActiveState distributes a graphical Perl debugger with their Perl Development Kit (PDK), shown in Figure 15.2.
Once installed, using the -d flag with perl invokes this debugger instead of the standard Perl debugger. It can also be invoked when running CGI scripts if you are logged into the web server.
You can obtain the PDK and corresponding documentation from ActiveState’s web site at http://www.activestate.com/. The PDK is a commercial product, but as of the time this book was written, ActiveState offers a free seven-day trial.
[21] In the X Window System, you run an X Window server locally, which displays programs that you may execute remotely. The use of “server” in this context is sometimes confusing, since you typically use a client to interact with remote systems.
3.145.46.18