Of all the tasks of Hercules, the one where he had to keep his web site’s XHTML validated was the hardest. Without wanting to approach the whole Valid XHTML Controversy, we can still safely say that keeping a site validated is a pain. You have to validate your code, most commonly using the W3C validator service at http://validator.w3.org, and you have to keep going back there to make sure nothing has broken.
You have to do that unless, of course, you’re subscribed to a feed of validation results. This script does just that, providing an RSS interface to the W3C validator.
You pass the URL you want to test as a query in the feed URL, like
so:
http://www.example.org/validator.cgi?url=http://www.example.org/index.html
.
We’re using the traditional Perl start plus
LWP::Simple
and XML::Simple
,
which will parse the results coming back from the validator. Note
that, in the classic gotcha, LWP::Simple
and
CGI
clash, so we have to add those additional
flags to prevent a type mismatch.
use warnings; use strict; use XML::RSS; use CGI qw(:standard); use LWP::Simple 'get'; use XML::Simple;
Now, grab the URL from the query string, and use
LWP::Simple
to retrieve the results. The W3C
provides an XML output mode for the validator, and this is what
we’re using here. It is, however, classed as beta
and flakey, and might not always work.
my $cgi = CGI::new( ); my $url = $cgi->param('url'), my $validator_results_in_xml = get("http://validator.w3.org/check?uri=$url;output=xml");
Curiously enough, the top of the XML that is returned causes
XML::Simple
to throw an error. Use a split
function to trim off this broken section:
my ( $broken_xml_to_ignore, $trimmed_validator_results_in_xml ) = split ( /]>/, $validator_results_in_xml );
Now, place the valid XML into an XML::Simple
object, and parse it:
my $parsed_validator_results = XMLin($trimmed_validator_results_in_xml);
Now is a good a time as any to set up the top of the feed:
my $rss = new XML::RSS( version => '2.0' ); $rss->channel( title => "XHTML Validation results for $url", link => "http://validator.w3.org/check?uri=$url", description => "w3c validation results for $url" );
Then it’s a simple matter of running through each
error message the validator gives and turning it into a feed
item
:
foreach my $error ( @{ $parsed_validator_results->{'messages'}->{'msg'} } ) { $rss->add_item( title => "Line $error->{'line'} $error->{'content'}", link => "http://validator.w3.org/check?uri=$url", description => "Line $error->{'line'} $error->{'content'}", ); }
Finally, serve it up:
print header('application/xml+rss'), print $rss->as_string;
#!/usr/bin/perl use warnings; use strict; use XML::RSS; use CGI qw(:standard); use LWP::Simple 'get'; use XML::Simple; my $cgi = CGI::new( ); my $url = $cgi->param('url'), my $validator_results_in_xml = get("http://validator.w3.org/check?uri=$url;output=xml"); my ( $broken_xml_to_ignore, $trimmed_validator_results_in_xml ) = split ( /]>/, $validator_results_in_xml ); my $parsed_validator_results = XMLin($trimmed_validator_results_in_xml); my $rss = new XML::RSS( version => '2.0' ); $rss->channel( title => "XHTML Validation results for $url", link => "http://validator.w3.org/check?uri=$url", description => "w3c validation results for $url" ); foreach my $error ( @{ $parsed_validator_results->{'messages'}->{'msg'} } ) { $rss->add_item( title => "Line $error->{'line'} $error->{'content'}", link => "http://validator.w3.org/check?uri=$url", description => "Line $error->{'line'} $error->{'content'}", ); } print header('application/xml+rss'), print $rss->as_string;
3.145.186.147