PoXML is a drop-in replacement, of sorts, for the SOAP::Lite-less.
PoXML is a bit of home-brewed hackery for those who don’t have the SOAP::Lite [Hack #52] Perl module at their disposal. Perhaps you had more than enough trouble installing it yourself.
Any Perl guru will insist that module installation is as simple as can be. That said, any other Perl guru will be forced to admit that it’s an inconsistent experience and often harder than it should be.
PoXML is a drop-in replacement—to a rather decent degree—for SOAP::Lite. It treats Google’s SOAP as plain old XML, using the LWP::UserAgent module to make HTTP requests and XML::Simple to parse the XML response. And best of all, it requires little more than a two-line alteration to the target hack.
The heart of this hack is PoXML.pm
, a little
Perl module best saved into the same directory as your hacks.
# PoXML.pm # PoXML [pronounced "plain old xml"] is a dire-need drop-in # replacement for SOAP::Lite designed for Google Web API hacking. package PoXML; use strict; no strict "refs"; # LWP for making HTTP requests, XML for parsing Google SOAP use LWP::UserAgent; use XML::Simple; # Create a new PoXML sub new { my $self = {}; bless($self); return $self; } # Replacement for the SOAP::Lite-based doGoogleSearch method sub doGoogleSearch { my($self, %args); ($self, @args{qw/ key q start maxResults filter restrict safeSearch lr ie oe /}) = @_; # grab SOAP request from __DATA_ _ my $tell = tell(DATA); my $soap_request = join '', ; seek(DATA, $tell, 0); $soap_request =~ s/$(w+)/$args{$1}/ge; #interpolate variables # Make (POST) a SOAP-based request to Google my $ua = LWP::UserAgent->new; my $req = HTTP::Request->new( POST => 'http://api.google.com/search/beta2'), $req->content_type('text/xml'), $req->content($soap_request); my $res = $ua->request($req); my $soap_response = $res->as_string; # Drop the HTTP headers and so forth until the initial xml element $soap_response =~ s/^.+?(<?xml)/$1/migs; # Drop element namespaces for tolerance of future prefix changes $soap_response =~ s!(</?)[w-]+?:([w-]+?)!$1$2!g; # Parse the XML my $results = XMLin($soap_response); # Normalize and drop the unnecessary encoding bits my $return = $results->{'Body'}->{'doGoogleSearchResponse'}->{return}; foreach ( keys %{$return} ) { $return->{$_}->{content} and $return->{$_} = $return->{$_}->{content} || ''; } my @items; foreach my $item ( @{$return->{resultElements}->{item}} ) { foreach my $key ( keys %$item ) { $item->{$key} = $item->{$key}->{content} || ''; } push @items, $item; } $return->{resultElements} = @items; my @categories; foreach my $key ( keys %{$return->{directoryCategories}->{item}} ) { $return->{directoryCategories}->{$key} = $return->{directoryCategories}->{item}->{$key}->{content} || ''; } # Return nice, clean, usable results return $return; } 1; # This is the SOAP message template sent to api.google.com. Variables # signified with $variablename are replaced by the values of their # counterparts sent to the doGoogleSearch subroutine. __DATA_ _ <?xml version='1.0' encoding='UTF-8'?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <ns1:doGoogleSearch xmlns:ns1="urn:GoogleSearch" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <key xsi:type="xsd:string">$key</key> <q xsi:type="xsd:string">$q</q> <start xsi:type="xsd:int">$start</start> <maxResults xsi:type="xsd:int">$maxResults</maxResults> <filter xsi:type="xsd:boolean">$filter</filter> <restrict xsi:type="xsd:string">$restrict</restrict> <safeSearch xsi:type="xsd:boolean">$safeSearch</safeSearch> <lr xsi:type="xsd:string">$lr</lr> <ie xsi:type="xsd:string">$ie</ie> <oe xsi:type="xsd:string">$oe</oe> </ns1:doGoogleSearch> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
Here’s a little script to show PoXML in action. Its no different, really, from any number of hacks in this book. The only minor alterations necessary to make use of PoXML instead of SOAP::Lite are highlighted in bold.
#!/usr/bin/perl # poxml_google2csv.pl # Google Web Search Results via PoXML ("plain old xml") module # exported to CSV suitable for import into Excel # Usage: poxml_google2csv.pl "{query}" [> results.csv] # Your Google API developer's key my $google_key = 'insert key here'; use strict;# use SOAP::Lite;
use PoXML;
$ARGV[0] or die qq{usage: perl poxml_search2csv.pl "{query}" };# my $google_search = SOAP::Lite->service("file:$google_wdsl");
my $google_search = new PoXML;
my $results = $google_search -> doGoogleSearch( $google_key, shift @ARGV, 0, 10, "false", "", "false", "", "latin1", "latin1" ); @{$results->{'resultElements'}} or die('No results'), print qq{"title","url","snippet" }; foreach (@{$results->{'resultElements'}}) { $_->{title} =~ s!"!""!g; # double escape " marks $_->{snippet} =~ s!"!""!g; my $output = qq{"$_->{title}","$_->{URL}","$_->{snippet}" }; $output =~ s!<.+?>!!g; # drop all HTML tags print $output; }
Run the script from the command line, providing a query on the
command line and piping the output to a CSV file you wish to create
or to which you wish to append additional results. For example, using
"plain
old
xml"
as our query and
results.csv
as our output:
$ perl poxml_google2csv.pl "plain old xml" > results.csv
Leaving off the >
and CSV filename sends the
results to the screen for your perusal.
% perl poxml_google2csv.pl "plain old xml"
"title","url","snippet"
"XML.com: Distributed XML [Sep. 06, 2000]",
"http://www.xml.com/pub/2000/09/06/distributed.html",
" ... extensible. Unlike plain old XML, there's no sense of
constraining what the document can describe by a DTD or schema.
This means ... "
...
"Plain Old Documentation",
"http://axkit.org/wiki/view/AxKit/PlainOldDocumentation",
" ... perlpodspec - Plain Old Documentation: format specification
and notes. ... Examples: =pod This is a plain Pod paragraph. ...
encodings in Pod parsing would be as in XML ... "
In the same manner, you can adapt just about any SOAP::Lite-based hack in this book and those you’ve made up yourself to use PoXML.
Place PoXML.pm
in the same directory as the hack
at hand.
Replace use SOAP::Lite;
with use PoXML;
.
Replace my
$google_search
=
SOAP::Lite->service("file:$google_wdsl");
with
my $google_search = new PoXML;
.
There are, however, some limitations. While PoXML works nicely to
extract results and aggregate results the likes of
<estimatedTotalResultsCount />
, it falls
down on gleaning some of the more advanced result elements like
<directoryCategories />
, an array of
categories turned up by the query.
In general, bear in mind that your mileage may vary, and don’t be afraid to tweak.
NoXML [Hack #54], a regular expressions-based, XML Parser-free SOAP::Lite alternative
XooMLE [Hack #36], a third-party service offering an intermediary plain old XML interface to the Google Web API
18.117.76.204