By now, the traditional use of feeds as a form of content syndication is beginning to look somewhat old-fashioned. But fear not: here’s a use that is as traditional as can be.
I needed a script to check all of the weblogs I write for and to cross-post everything I write onto my own weblog. Bear in mind that I’m not the only author on these other sites.
To do this, check their RSS feeds on a set schedule, grab the content within, build a big entry, and then post it. This code is for a Movable Type installation, but it isn’t hard to modify it to fit another weblogging platform.
You open the proceedings by defining all of the libraries and modules. This is the exact same code as I have running on my own server, so you need to modify the following paths to point to your own Movable Type libraries, blog IDs, and so on:
use lib "/web/script/ben/mediacooperative.com/lib"; use lib "/web/script/ben/mediacooperative.com/extlib"; use lib "/web/script/ben/lib/perl"; use MT; use MT::Entry; use Date::Manip; use LWP::Simple 'get'; use XML::RSS; my $MTauthor = "1"; my $MTblogID = "3"; my $MTconfig = "/home/ben/web/mediacooperative.com/mt.cfg"; my $guts = "";
Now, let’s set up a list of sites to check. For each
site, you need to define only the feed URL and the
<dc:creator>
or
<author>
name under which I am posting.
Everything else you can get from the feed itself. For example:
http://del.icio.us/rss/bhammersley “bhammersley” http://www.oreillynet.com/feeds/author/?x-au=909 “Ben Hammersley” http://monkeyfilter.com/rss.php “DangerIsMyMiddleName” http://www.benhammersley.com/expeditions/northpole2006/index.rdf “Ben Hammersley" |
You can do this with an array of arrays:
my @sites_to_check = ( [ "http://del.icio.us/rss/bhammersley", "bhammersley" ], [ "http://www.oreillynet.com/feeds/author/?x-au=909", "Ben Hammersley" ], [ "http://monkeyfilter.com/rss.php", "DangerIsMyMiddleName" ], [ "http://www.benhammersley.com/expeditions/northpole2006/index.rdf", "Ben Hammersley" ], );
Now, the loop. You go through each feed, downloading, parsing, and so
on. Let’s start by taking the
site_feed_url
and the
site_author_nym
(the name under which I go on that
site) out of the array. This step could be omitted, but for the sake
of clarity, we’ll leave it here.
for my $site_being_checked (@sites_to_check) { my $site_feed_url = @$site_being_checked[0]; my $site_author_nym = @$site_being_checked[1];
Now, retrieve the feed, or go to the next one if it fails:
my $feed_xml = get("$site_feed_url") or next;
And now, to parse it. You do so by spawning a new instance of the
XML::RSS
parser and jamming the feed into it:
my $rss_parser = XML::RSS->new( ); $rss_parser->parse($feed_xml);
To set up for the strange occasion where there might be new content to post, let’s query the newly created RSS parser object for its name:
my $feed_name = $rss_parser->{channel}->{title}; my $feed_link = $rss_parser->{channel}->{link};
Now, go through each of the items within the field, and grab all
needed data out of them: the link, title, description, author, and
date. Note that you have to include the fallbacks of the
guid
, content, and the various
dc
values to deal with different versions of RSS.
foreach my $item ( @{ $rss_parser->{items} } ) { my $item_link = $$item{link} || $$item{guid}; my $item_title = $$item{title}; my $item_description = $$item{description}; my $item_author = $$item{author} || $$item{dc}->{creator}; my $item_date = $$item{pubDate} || $$item{dc}->{date};
Now, check to see if any were written today by me. First, work out what time and date it is now. Then, compare the post’s date with the date now, and, if it’s less than 24 hours behind, and it was written by me, then all is good.
Note: to get this code to work with del.icio.us and any other sites
that use date strings with z instead of +00:00 (which
Date::Manip
can’t deal with), you
have to use a nasty substitution. Sorry about that.
my $todays_date = &UnixDate( "now", "%Y-%m-%dT%H:%M:%S+00:00" ); $item_date =~ s/Z/+00:00/; my $date_delta = DateCalc( "$item_date", "$todays_date", $err, 1 ); my $parsed_delta = Delta_Format( "$date_delta", exact, '%dh' ); if ( ( $parsed_delta < 1 ) and ( $item_author eq $site_author_nym ) ) {
If all the tests turn out to be true, add a bunch of HTML to the
$guts
variable, within which
you’re building the new entry:
$guts .= qq|<div id="CrossPoster"><blockquote><a href="$item_link">$item_title</a><br/>posted to <a href="$feed_link">$feed_name</a><br/></p><p>$item_description</p></blockquote></div>|; } } }
Now, having worked our way through the feeds, if the
$guts
has anything in it, you need to post it and
take care of that end:
if ( $guts ne "" ) { my $mt = MT->new( Config => $MTconfig ) or die MT->errstr; my $entry = MT::Entry->new; $entry->blog_id($MTblogID); $entry->status( MT::Entry::RELEASE( ) ); $entry->author_id($MTauthor); $entry->title("Posted elsewhere today"); $entry->text($guts); $entry->convert_breaks(0); $entry->save or die $entry->errstr; # rebuild the site $mt->rebuild( BlogID => $MTblogID ) or die "Rebuild error: " . $mt->errstr; # ping aggregators $mt->ping($MTblogID); }
#!/usr/bin/perl use lib "/web/script/ben/mediacooperative.com/lib"; use lib "/web/script/ben/mediacooperative.com/extlib"; use lib "/web/script/ben/lib/perl"; use MT; use MT::Entry; use Date::Manip; use LWP::Simple 'get'; use XML::RSS; my $MTauthor = "1"; my $MTblogID = "3"; my $MTconfig = "/home/ben/web/mediacooperative.com/mt.cfg"; my $guts = ""; my @sites_to_check = ( [ "http://del.icio.us/rss/bhammersley", "bhammersley" ], [ "http://www.oreillynet.com/feeds/author/?x-au=909", "Ben Hammersley" ], [ "http://monkeyfilter.com/rss.php", "DangerIsMyMiddleName" ], [ "http://www.benhammersley.com/expeditions/northpole2006/index.rdf", "Ben Hammersley" ], ); for my $site_being_checked (@sites_to_check) { my $site_feed_url = @$site_being_checked[0]; my $site_author_nym = @$site_being_checked[1]; my $feed_xml = get("$site_feed_url") or next; my $rss_parser = XML::RSS->new( ); $rss_parser->parse($feed_xml); my $feed_name = $rss_parser->{channel}->{title}; my $feed_link = $rss_parser->{channel}->{link}; foreach my $item ( @{ $rss_parser->{items} } ) { my $item_link = $$item{link} || $$item{guid}; my $item_title = $$item{title}; my $item_description = $$item{description}; my $item_author = $$item{author} || $$item{dc}->{creator}; my $item_date = $$item{pubDate} || $$item{dc}->{date}; my $todays_date = &UnixDate( "now", "%Y-%m-%dT%H:%M:%S+00:00" ); $item_date =~ s/Z/+00:00/; my $date_delta = DateCalc( "$item_date", "$todays_date", $err, 1 ); my $parsed_delta = Delta_Format( "$date_delta", exact, '%dh' ); if ( ( $parsed_delta < 1 ) and ( $item_author eq $site_author_nym ) ) { $guts .= qq|<div id="CrossPoster"><blockquote><a href="$item_link">$item_title</a><br/>posted to <a href="$feed_link">$feed_name</a><br/></p><p>$item_description</p></blockquote></div>|; } } } if ( $guts ne "" ) { my $mt = MT->new( Config => $MTconfig ) or die MT->errstr; my $entry = MT::Entry->new; $entry->blog_id($MTblogID); $entry->status( MT::Entry::RELEASE( ) ); $entry->author_id($MTauthor); $entry->title("Posted elsewhere today"); $entry->text($guts); $entry->convert_breaks(0); $entry->save or die $entry->errstr; # rebuild the site $mt->rebuild( BlogID => $MTblogID ) or die "Rebuild error: " . $mt->errstr; # ping aggregators $mt->ping($MTblogID); }
3.138.102.114