The Practical Extraction and Reporting Language (PERL) has long been a mainstay of server-side programming and a foundation of Common Gateway Interface (CGI) programming. Perl has been getting into XML in a big way, and one could easily write a book on the subject.
Perl modules are distributed at the Comprehensive Perl Archive Network (CPAN) site, at http://www.cpan.org, and plenty of them deal with XML (I counted 156). Table 20.2 provides a selection of Perl XML modules, along with their descriptions as given on the CPAN site.
Most of the Perl XML modules that appear in Table 20.2 must be downloaded and installed before you can use them. (The process is lengthy, if straightforward; download manager tools exist for Windows and UNIX that can manage the download and installation process and make things easier.) The Perl distribution comes with some XML support built in, such as the XML::Parser module.
Here's an example putting XML::Parser to work. In this case, I'll parse an XML document and print it out using Perl. The XML::Parser module can handle callbacks—calling subroutines when the beginning of an element is encountered—as well as the text content in an element and the end of an element. Here's how I set up such calls to the handler subroutines start_ handler, char_handler, and end_handler, respectively, creating a new parser object named $parser in Perl:
use XML::Parser; $parser = new XML::Parser(Handlers => {Start => &start_handler, End => &end_handler, Char => &char_handler}); . . .
Now I need an XML document to parse. I'll use a document we've seen before, meetings.xml (in Chapter 7, "Handling XML Documents with JavaScript"):
<?xml version="1.0"?> <MEETINGS> <MEETING TYPE="informal"> <MEETING_TITLE>XML</MEETING_TITLE> <MEETING_NUMBER>2079</MEETING_NUMBER> <SUBJECT>XML</SUBJECT> <DATE>6/1/2002</DATE> <PEOPLE> <PERSON ATTENDANCE="present"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="absent"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="present"> <FIRST_NAME>Betty</FIRST_NAME> <LAST_NAME>Richardson</LAST_NAME> </PERSON> </PEOPLE> </MEETING> </MEETINGS>
I can parse that document using the $parser object's parsefile method:
use XML::Parser; $parser = new XML::Parser(Handlers => {Start => &start_handler, End => &end_handler, Char => &char_handler}); $parser->parsefile('meetings.xml'), . . .
All that remains is to create the subroutines start_handler, char_handler, and end_handler. I'll begin with start_handler, which is called when the start of an XML element is encountered. The name of the element is stored in item 1 of the standard Perl array @_, which holds the arguments passed to subroutines. I can display that element's opening tag like this:
use XML::Parser; $parser = new XML::Parser(Handlers => {Start => &start_handler, End => &end_handler, Char => &char_handler}); $parser->parsefile('meetings.xml'), sub start_handler { print "<$_[1]> "; } . . .
I'll also print out the closing tag in the end_handler subroutine:
use XML::Parser; $parser = new XML::Parser(Handlers => {Start => &start_handler, End => &end_handler, Char => &char_handler}); $parser->parsefile('meetings.xml'), sub start_handler { print "<$_[1]> "; } sub end_handler { print "</$_[1]> "; } . . .
I can print out the text content of the element in the char_handler subroutine, after removing discardable whitespace:
use XML::Parser; $parser = new XML::Parser(Handlers => {Start => &start_handler, End => &end_handler, Char => &char_handler}); $parser->parsefile('meetings.xml'), sub start_handler { print "<$_[1]> "; } sub end_handler { print "</$_[1]> "; } sub char_handler { if(index($_[1], " ") < 0 &&index($_[1], " ") < 0){ print "$_[1] "; } }
That completes the code. Running this Perl script gives you the following result, showing that meetings.xml was indeed parsed successfully:
<MEETINGS> <MEETING> <MEETING_TITLE> XML </MEETING_TITLE> <MEETING_NUMBER> 2079 </MEETING_NUMBER> <SUBJECT> XML </SUBJECT> <DATE> 6/1/2002 </DATE> <PEOPLE> <PERSON> <FIRST_NAME> Edward </FIRST_NAME> <LAST_NAME> Samson </LAST_NAME> </PERSON> <PERSON> <FIRST_NAME> Ernestine </FIRST_NAME> <LAST_NAME> Johnson </LAST_NAME> </PERSON> <PERSON> <FIRST_NAME> Betty </FIRST_NAME> <LAST_NAME> Richardson </LAST_NAME> </PERSON> </PEOPLE> </MEETING> </MEETINGS>
Writing this script, parsing the document, and implementing callbacks like this in Perl may remind you quite closely of the Java SAX work in Chapter 12.
I'll take a look at serving XML documents from Perl scripts next. Unfortunately, Perl doesn't come with a built-in database protocol as powerful as JDBC and its ODBC handler, or ASP and its ADO support. The database support that comes built into Perl is based on DBM files, which are hash-based databases. (Now, of course, you can install many Perl modules to interface to other database protocols, from ODBC to Oracle.)
In this case, I'll write a Perl script that will let you enter a key (such as vegetable) and a value (such as broccoli) to store in a database built in the NDBM database format, which is a default format that Perl supports. This database will be stored on the server. When you enter a key into the page created by this script, the code checks the database for a match to that key; if found, it returns the key and its value. For example, when I enter the key vegetable and the value broccoli, that key/value pair is stored in the database. When I subsequently search for a match to the key vegetable, the script returns both that key and the matching value, broccoli, in an XML document using the tags <key> and <value>:
<?xml version="1.0" ?> <document> <key>vegetable</key> <value>broccoli</value> </document>
Figure 20.4 shows the results of the CGI script. To add an entry to the database, you enter a key into the text field marked Key To Add to the Database and a corresponding value in the text field marked Value to Add to the Database, and then click the Add to Database button. In Figure 20.4, I'm storing the value broccoli under the key vegetable.
To retrieve a value from the database, you enter the value's key in the box marked Key to Search For, and click the Look Up Value button. The database is searched and an XML document with the results is sent to the client, as shown in Figure 20.5. In this case, I've searched for the key vegetable. Although this XML document is displayed in a browser, it's relatively easy to use Internet sockets in Perl code to let you read and handle such XML without a browser.
In this Perl script, I'll use CGI.pm, the official Perl CGI module that comes with the standard Perl distribution. I begin by creating the Web page shown earlier in Figure 20.4, including all the HTML controls we'll need:
#!/usr/local/bin/perl use Fcntl; use NDBM_File; use CGI; $co = new CGI; if(!$co->param()) { print $co->header, $co->start_html('CGI Functions Example'), $co->center($co->h1('CGI Database Example')), $co->hr, $co->b("Add a key/value pair to the database…"), $co->start_form, "Key to add to the database: ", $co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, "Value to add to the database: ", $co->textfield(-name=>'value',-default=>'', -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'write', -override=>1), $co->br, $co->center( $co->submit('Add to database'), $co->reset ), $co->end_form, $co->hr, $co->b("Look up a value in the database…"), $co->start_form, "Key to search for: ",$co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'read', -override=>1), $co->br, $co->center( $co->submit('Look up value'), $co->reset ), $co->end_form, $co->hr; print $co->end_html; } . . .
This CGI creates two HTML forms—one for use when you want to store key/value pairs, and one when you want to enter a key to search for. I didn't specify a target for these two HTML forms in this page to send their data, so the data will simply be sent back to the same script. I can check whether the script has been called with data to be processed by checking the return value of the CGI.pm param method; if it's true, data is waiting for us to work on.
The document this script returns is an XML document, not the default HTML—so how do you set the content type in the HTTP header to indicate that? You do so by using the header method, setting the type named parameter to "application/xml". This code follows the previous code in the script:
if($co->param()) { print $co->header(-type=>"application/xml"); print "<?xml version = "1.0"?>"; print "<document>"; . . .
I keep the two HTML forms separate with a hidden data variable named type. If that variable is set to "write", I enter the data that the user supplied into the database:
if($co->param()) { print $co->header(-type=>"application/xml"); print "<?xml version = "1.0"?>"; print "<document>"; if($co->param('type') eq 'write') { tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644; $key = $co->param('key'), $value = $co->param('value'), $dbhash{$key} = $value; untie %dbhash; if ($!) { print "There was an error: $!"; } else { print "$key=>$value stored in the database"; } } . . .
Otherwise, I search the database for the key the user has specified, and return both the key and the corresponding value in an XML document:
if($co->param()) { print $co->header(-type=>"application/xml"); print "<?xml version = "1.0"?>"; print "<document>"; if($co->param('type') eq 'write') { tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644; $key = $co->param('key'), $value = $co->param('value'), $dbhash{$key} = $value; untie %dbhash; if ($!) { print "There was an error: $!"; } else { print "$key=>$value stored in the database"; } } else { tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644; $key = $co->param('key'), $value = $dbhash{$key}; print "<key>"; print $key; print "</key>"; print "<value>"; print $value; print "</value>"; if ($value) { if ($!) { print "There was an error: $!"; } } else { print "No match found for that key"; } untie %dbhash; } print "</document>"; }
In this way, we've been able to store data in a database using Perl, and retrieve that data formatted as XML. Listing 20.1 provides the complete listing for this CGI script, dbxml.cgi. Of course, doing everything yourself like this is the hard way—if you get into Perl XML development, you should take a close look at the dozens of Perl XML modules available at CPAN.
#!/usr/local/bin/perl use Fcntl; use NDBM_File; use CGI; $co = new CGI; if(!$co->param()) { print $co->header, $co->start_html('CGI Functions Example'), $co->center($co->h1('CGI Database Example')), $co->hr, $co->b("Add a key/value pair to the database…"), $co->start_form, "Key to add to the database: ", $co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, "Value to add to the database: ", $co->textfield(-name=>'value',-default=>'', -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'write', -override=>1), $co->br, $co->center( $co->submit('Add to database'), $co->reset ), $co->end_form, $co->hr, $co->b("Look up a value in the database…"), $co->start_form, "Key to search for: ",$co->textfield(-name=>'key',-default=>'', -override=>1), $co->br, $co->hidden(-name=>'type',-value=>'read', -override=>1), $co->br, $co->center( $co->submit('Look up value'), $co->reset ), $co->end_form, $co->hr; print $co->end_html; } if($co->param()) { print $co->header(-type=>"application/xml"); print "<?xml version = "1.0"?>"; print "<document>"; if($co->param('type') eq 'write') { tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644; $key = $co->param('key'), $value = $co->param('value'), $dbhash{$key} = $value; untie %dbhash; if ($!) { print "There was an error: $!"; } else { print "$key=>$value stored in the database"; } } else { tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644; $key = $co->param('key'), $value = $dbhash{$key}; print "<key>"; print $key; print "</key>"; print "<value>"; print $value; print "</value>"; if ($value) { if ($!) { print "There was an error: $!"; } } else { print "No match found for that key"; } untie %dbhash; } print "</document>"; } |
18.119.116.102