XML and Perl

The Practical Extraction and Reporting Language (PERL) has long been a mainstay of server-side programming and a foundation of Common Gateway Interface (CGI) programming. Perl has been getting into XML in a big way, and one could easily write a book on the subject.

Perl modules are distributed at the Comprehensive Perl Archive Network (CPAN) site, at http://www.cpan.org, and plenty of them deal with XML (I counted 156). Table 20.2 provides a selection of Perl XML modules, along with their descriptions as given on the CPAN site.

Table 20.2. XML Modules in Perl with CPAN Descriptions
ModuleDescription
Apache::AxKit::XMLFinderDetects XML files
Apache::MimeXMLmod_perl mime encoding sniffer for XML files
Boulder::XMLXML format input/output for Boulder streams
Bundle::XMLBundle to install all XML-related modules
CGI::XMLFormExtension of CGI.pm that reads/generates formatted XML
Data::DumpXMLDumps arbitrary data structures as XML
DBIx::XML_RDBPerl extension for creating XML from existing DBI data sources
GoXML::XQIPerl extension for the XML Query Interface at xqi.goxml.com.
Mail::XMLAdds toXML()method to Mail::Internet.
MARC::XMLSubclass of MARC.pm to provide XML support
PApp::XMLpxml sections and more
XML::CatalogResolves public identifiers and remaps system identifiers
XML::CGIPerl extension for converting CGI.pm variables to/from XML
XML::CheckerPerl module for validating XML
XML::Checker::ParserXML::Parserthat validates at parse time
XML::DOMPerl module for building DOM Level 1 compliant document structures
XML::DOM::NamedNodeMapHash table interface for XML::DOM
XML::DOM::NodeListNode list as used by XML::DOM
XML::DOM::PerlSAXOld name of XML::Handler::BuildDOM
XML::DOM::ValParserXML::DOM::Parser that validates at parse time
XML::Driver::HTMLSAX driver for non–well-formed HTML
XML::DTPackage for down translation of XML to strings
XML::EdifactPerl module to handle XML::Edifact messages
XML::EncodingPerl module for parsing XML encoding maps
XML::ESISParserPerlSAX parser using nsgmls
XML::Filter::DetectWSPerlSAX filter that detects ignorable whitespace
XML::Filter::HekelnSAX stream editor
XML::Filter::ReindentReformats whitespace for prettyprinting XML
XML::Filter::SAXTReplicates SAX events to several SAX event handlers
XML::GeneratorPerl extension for generating XML
XML::GrovePerl-style XML objects
XML::Grove::AsCanonXMLOutputs XML objects in canonical XML
XML::Grove::AsStringOutputs content of XML objects as a string
XML::Grove::BuilderPerlSAX handler for building an XML::Grove
XML::Grove::FactorySimplifies creation of XML::Grove objects
XML::Grove::PathReturns the object at a path
XML::Grove::PerlSAXPerlSAX event interface for XML objects
XML::Grove::SubRuns a filter sub over a grove
XML::Grove::SubstSubstitutes values into a template
XML::Handler::BuildDOMPerlSAX handler that creates XML::DOM document structures
XML::Handler::CanonXMLWriterOutputs XML in canonical XML format
XML::Handler::ComposerXML printer/writer/generator
XML::Handler::PrintEventsPrints PerlSAX events (for debugging)
XML::Handler::PyxWriterConverts PerlSAX events to ESIS of nsgmls
XML::Handler::SampleTrivial PerlSAX handler
XML::Handler::SubsPerlSAX handler base class for calling user-defined subs
XML::Handler::XMLWriterPerlSAX handler for writing readable XML
XML::Handler::YAWriterYet another PerlSAX XML Writer 0.15
XML::NodeNode-based XML parsing: a simplified interface to XML
XML::ParserPerl module for parsing XML documents
XML::Parser::ExpatLow-level access to James Clark's expat XML parser
XML::Parser::PerlSAXPerlSAX parser using XML::Parser
XML::Parser::PyxParserConvert ESIS of nsgmls or Pyxie to PerlSAX
XML::PatAct::AmsterdamAction module for simplistic style sheets
XML::PatAct::MatchNamePattern module for matching element names
XML::PatAct::ToObjectsAction module for creating Perl objects
XML::PYXXML to PYX generator
XML::QLXML query language
XML::RegExpRegular expressions for XML tokens
XML::RegistryPerl module for loading and saving an XML registry
XML::RSSCreates and updates RSS files
XML::SAX2PerlTranslates PerlSAX methods to Java/CORBA-style methods
XML::SimpleTrivial API for reading and writing XML (esp. config files)
XML::StreamCreates an XML Stream connection and parses return data
XML::Stream::NamespaceObject to make defining namespaces easier
XML::TemplatePerl XML template instantiation
XML::TwigPerl module for processing huge XML documents in tree mode
XML::UMConverts UTF-8 strings to any encoding supported by XML::Encoding
XML::WriterPerl extension for writing XML documents
XML::XPathSet of modules for parsing and evaluating XPath
XML::XPath::BooleanBoolean true/false values
XML::XPath::BuilderSAX handler for building an XPath tree
XML::XPath::LiteralSimple string values
XML::XPath::NodeInternal representation of a node
XML::XPath::NodeSetList of XML document nodes
XML::XPath::NumberSimple numeric values
XML::XPath::PerlSAXPerlSAX event generator
XML::XPath::XMLParserDefault XML parsing class that produces a node tree
XML::XQLPerl module for querying XML tree structures with XQL
XML::XQL::DateAdds an XQL::Node type for representing and comparing dates and times
XML::XQL::DOMAdds XQL support to XML::DOM nodes
XML::XSLTPerl module for processing XSLT
XMLNews::HTMLTemplateModule for converting NITF to HTML
XMLNews::MetaModule for reading and writing XMLNews metadata files

Most of the Perl XML modules that appear in Table 20.2 must be downloaded and installed before you can use them. (The process is lengthy, if straightforward; download manager tools exist for Windows and UNIX that can manage the download and installation process and make things easier.) The Perl distribution comes with some XML support built in, such as the XML::Parser module.

Here's an example putting XML::Parser to work. In this case, I'll parse an XML document and print it out using Perl. The XML::Parser module can handle callbacks—calling subroutines when the beginning of an element is encountered—as well as the text content in an element and the end of an element. Here's how I set up such calls to the handler subroutines start_ handler, char_handler, and end_handler, respectively, creating a new parser object named $parser in Perl:

use XML::Parser;

$parser = new XML::Parser(Handlers => {Start => &start_handler,
        End   => &end_handler,
        Char  => &char_handler});
    .
    .
    .

Now I need an XML document to parse. I'll use a document we've seen before, meetings.xml (in Chapter 7, "Handling XML Documents with JavaScript"):

<?xml version="1.0"?>
<MEETINGS>
   <MEETING TYPE="informal">
       <MEETING_TITLE>XML</MEETING_TITLE>
       <MEETING_NUMBER>2079</MEETING_NUMBER>
       <SUBJECT>XML</SUBJECT>
       <DATE>6/1/2002</DATE>
       <PEOPLE>
           <PERSON ATTENDANCE="present">
               <FIRST_NAME>Edward</FIRST_NAME>
               <LAST_NAME>Samson</LAST_NAME>
           </PERSON>
           <PERSON ATTENDANCE="absent">
               <FIRST_NAME>Ernestine</FIRST_NAME>
               <LAST_NAME>Johnson</LAST_NAME>
           </PERSON>
           <PERSON ATTENDANCE="present">
               <FIRST_NAME>Betty</FIRST_NAME>
               <LAST_NAME>Richardson</LAST_NAME>
           </PERSON>
       </PEOPLE>
   </MEETING>
</MEETINGS>

I can parse that document using the $parser object's parsefile method:

use XML::Parser;

$parser = new XML::Parser(Handlers => {Start => &start_handler,
        End   => &end_handler,
        Char  => &char_handler});

$parser->parsefile('meetings.xml'),
    .
    .
    .

All that remains is to create the subroutines start_handler, char_handler, and end_handler. I'll begin with start_handler, which is called when the start of an XML element is encountered. The name of the element is stored in item 1 of the standard Perl array @_, which holds the arguments passed to subroutines. I can display that element's opening tag like this:

use XML::Parser;

$parser = new XML::Parser(Handlers => {Start => &start_handler,
        End   => &end_handler,
        Char  => &char_handler});

$parser->parsefile('meetings.xml'),

sub start_handler
{
    print "<$_[1]>
";
}
    .
    .
    .

I'll also print out the closing tag in the end_handler subroutine:

use XML::Parser;

$parser = new XML::Parser(Handlers => {Start => &start_handler,
        End   => &end_handler,
        Char  => &char_handler});

$parser->parsefile('meetings.xml'),

sub start_handler
{
    print "<$_[1]>
";
}

sub end_handler
{
    print "</$_[1]>
";
}
    .
    .
    .

I can print out the text content of the element in the char_handler subroutine, after removing discardable whitespace:

use XML::Parser;

$parser = new XML::Parser(Handlers => {Start => &start_handler,
        End   => &end_handler,
        Char  => &char_handler});

$parser->parsefile('meetings.xml'),

sub start_handler
{
    print "<$_[1]>
";
}

sub end_handler
{
    print "</$_[1]>
";
}

sub char_handler
{
    if(index($_[1], " ") < 0 &&index($_[1], "
") < 0){
        print "$_[1]
";
    }
}

That completes the code. Running this Perl script gives you the following result, showing that meetings.xml was indeed parsed successfully:

<MEETINGS>
<MEETING>
<MEETING_TITLE>
XML
</MEETING_TITLE>
<MEETING_NUMBER>
2079
</MEETING_NUMBER>
<SUBJECT>
XML
</SUBJECT>
<DATE>
6/1/2002
</DATE>
<PEOPLE>
<PERSON>
<FIRST_NAME>
Edward
</FIRST_NAME>
<LAST_NAME>
Samson
</LAST_NAME>
</PERSON>
<PERSON>
<FIRST_NAME>
Ernestine
</FIRST_NAME>
<LAST_NAME>
Johnson
</LAST_NAME>
</PERSON>
<PERSON>
<FIRST_NAME>
Betty
</FIRST_NAME>
<LAST_NAME>
Richardson
</LAST_NAME>
</PERSON>
</PEOPLE>
</MEETING>
</MEETINGS>

Writing this script, parsing the document, and implementing callbacks like this in Perl may remind you quite closely of the Java SAX work in Chapter 12.

I'll take a look at serving XML documents from Perl scripts next. Unfortunately, Perl doesn't come with a built-in database protocol as powerful as JDBC and its ODBC handler, or ASP and its ADO support. The database support that comes built into Perl is based on DBM files, which are hash-based databases. (Now, of course, you can install many Perl modules to interface to other database protocols, from ODBC to Oracle.)

In this case, I'll write a Perl script that will let you enter a key (such as vegetable) and a value (such as broccoli) to store in a database built in the NDBM database format, which is a default format that Perl supports. This database will be stored on the server. When you enter a key into the page created by this script, the code checks the database for a match to that key; if found, it returns the key and its value. For example, when I enter the key vegetable and the value broccoli, that key/value pair is stored in the database. When I subsequently search for a match to the key vegetable, the script returns both that key and the matching value, broccoli, in an XML document using the tags <key> and <value>:

<?xml version="1.0" ?>
<document>
    <key>vegetable</key>
    <value>broccoli</value>
</document>

Figure 20.4 shows the results of the CGI script. To add an entry to the database, you enter a key into the text field marked Key To Add to the Database and a corresponding value in the text field marked Value to Add to the Database, and then click the Add to Database button. In Figure 20.4, I'm storing the value broccoli under the key vegetable.

Figure 20.4. A Perl CGI script database manager.


To retrieve a value from the database, you enter the value's key in the box marked Key to Search For, and click the Look Up Value button. The database is searched and an XML document with the results is sent to the client, as shown in Figure 20.5. In this case, I've searched for the key vegetable. Although this XML document is displayed in a browser, it's relatively easy to use Internet sockets in Perl code to let you read and handle such XML without a browser.

Figure 20.5. An XML document generated by a Perl script.


In this Perl script, I'll use CGI.pm, the official Perl CGI module that comes with the standard Perl distribution. I begin by creating the Web page shown earlier in Figure 20.4, including all the HTML controls we'll need:

#!/usr/local/bin/perl
use Fcntl;
use NDBM_File;
use CGI;
$co = new CGI;

if(!$co->param()) {
print $co->header,
$co->start_html('CGI Functions Example'),
$co->center($co->h1('CGI Database Example')),
$co->hr,
$co->b("Add a key/value pair to the database…"),
$co->start_form,
"Key to add to the database: ",
$co->textfield(-name=>'key',-default=>'', -override=>1),
$co->br,
"Value to add to the database: ",
$co->textfield(-name=>'value',-default=>'', -override=>1),
$co->br,
$co->hidden(-name=>'type',-value=>'write', -override=>1),
$co->br,
$co->center(
    $co->submit('Add to database'),
    $co->reset
),
$co->end_form,
$co->hr,
$co->b("Look up a value in the database…"),
$co->start_form,
"Key to search for: ",$co->textfield(-name=>'key',-default=>'', -override=>1),
$co->br,
$co->hidden(-name=>'type',-value=>'read', -override=>1),
$co->br,
$co->center(
    $co->submit('Look up value'),
    $co->reset
),
$co->end_form,
$co->hr;
print $co->end_html;
}
    .
    .
    .

This CGI creates two HTML forms—one for use when you want to store key/value pairs, and one when you want to enter a key to search for. I didn't specify a target for these two HTML forms in this page to send their data, so the data will simply be sent back to the same script. I can check whether the script has been called with data to be processed by checking the return value of the CGI.pm param method; if it's true, data is waiting for us to work on.

The document this script returns is an XML document, not the default HTML—so how do you set the content type in the HTTP header to indicate that? You do so by using the header method, setting the type named parameter to "application/xml". This code follows the previous code in the script:

if($co->param()) {
    print $co->header(-type=>"application/xml");
    print "<?xml version = "1.0"?>";
    print "<document>";
    .
    .
    .

I keep the two HTML forms separate with a hidden data variable named type. If that variable is set to "write", I enter the data that the user supplied into the database:

if($co->param()) {
    print $co->header(-type=>"application/xml");
    print "<?xml version = "1.0"?>";
    print "<document>";
    if($co->param('type') eq 'write') {
        tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644;
        $key = $co->param('key'),
        $value = $co->param('value'),
        $dbhash{$key} = $value;
        untie %dbhash;
        if ($!) {
            print "There was an error: $!";
        } else {
            print "$key=>$value stored in the database";
        }
    }
    .
    .
    .

Otherwise, I search the database for the key the user has specified, and return both the key and the corresponding value in an XML document:

if($co->param()) {
    print $co->header(-type=>"application/xml");
    print "<?xml version = "1.0"?>";
    print "<document>";
    if($co->param('type') eq 'write') {
        tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644;
        $key = $co->param('key'),
        $value = $co->param('value'),
        $dbhash{$key} = $value;
        untie %dbhash;
        if ($!) {
            print "There was an error: $!";
        } else {
            print "$key=>$value stored in the database";
        }
    } else {
        tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644;
        $key = $co->param('key'),
        $value = $dbhash{$key};
        print "<key>";
        print $key;
        print "</key>";
        print "<value>";
        print $value;
        print "</value>";
        if ($value) {
           if ($!) {
                print "There was an error: $!";
            }
        } else {
            print "No match found for that key";
        }
        untie %dbhash;
    }
    print "</document>";
}

In this way, we've been able to store data in a database using Perl, and retrieve that data formatted as XML. Listing 20.1 provides the complete listing for this CGI script, dbxml.cgi. Of course, doing everything yourself like this is the hard way—if you get into Perl XML development, you should take a close look at the dozens of Perl XML modules available at CPAN.

Code Listing 20.1. dbxml.cgi
#!/usr/local/bin/perl
use Fcntl;
use NDBM_File;
use CGI;
$co = new CGI;

if(!$co->param()) {
print $co->header,
$co->start_html('CGI Functions Example'),
$co->center($co->h1('CGI Database Example')),
$co->hr,
$co->b("Add a key/value pair to the database…"),
$co->start_form,
"Key to add to the database: ",
$co->textfield(-name=>'key',-default=>'', -override=>1),
$co->br,
"Value to add to the database: ",
$co->textfield(-name=>'value',-default=>'', -override=>1),
$co->br,
$co->hidden(-name=>'type',-value=>'write', -override=>1),
$co->br,
$co->center(
    $co->submit('Add to database'),
    $co->reset
),
$co->end_form,
$co->hr,
$co->b("Look up a value in the database…"),
$co->start_form,
"Key to search for: ",$co->textfield(-name=>'key',-default=>'', -override=>1),
$co->br,
$co->hidden(-name=>'type',-value=>'read', -override=>1),
$co->br,
$co->center(
    $co->submit('Look up value'),
    $co->reset
),
$co->end_form,
$co->hr;
print $co->end_html;
}

if($co->param()) {
    print $co->header(-type=>"application/xml");
    print "<?xml version = "1.0"?>";
    print "<document>";
    if($co->param('type') eq 'write') {
        tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644;
        $key = $co->param('key'),
        $value = $co->param('value'),
        $dbhash{$key} = $value;
        untie %dbhash;
        if ($!) {
            print "There was an error: $!";
        } else {
            print "$key=>$value stored in the database";
        }
    } else {
        tie %dbhash, "NDBM_File", "dbdata", O_RDWR|O_CREAT, 0644;
        $key = $co->param('key'),
        $value = $dbhash{$key};
        print "<key>";
        print $key;
        print "</key>";
        print "<value>";
        print $value;
        print "</value>";
        if ($value) {
           if ($!) {
                print "There was an error: $!";
            }
        } else {
            print "No match found for that key";
        }
        untie %dbhash;
    }
    print "</document>";
}

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.116.102