RSS feeds and Atom feeds are a standardized way to distribute headlines and updates from websites and blogs. Both RSS and Atom feeds are XML documents. RSS is older but widely popular, while Atom is newer and has several advantages over RSS, chiefly the namespace support.
For the main differences between Atom and RSS, check the Wikipedia entry about RSS: http://en.wikipedia.org/wiki/RSS. Both formats are largely supported, and often blogs and sites output headline feeds in RSS and Atom at the same time.
In this recipe, we are going to cover the basics of RSS and Atom feed parsing with Groovy.
As RSS and Atom feeds are XML based, it's easy to parse them using one of the several tools offered by Groovy (see the Reading XML using XmlParser recipe in Chapter 5, Working with XML in Groovy).
We will show how to detect if a feed is RSS or Atom and parse it accordingly.
We will create a FeedParser
class which will contain the code to open a URL of a feed (RSS or Atom), parse it, and return a list of the FeedEntry
objects populated with the content of the feed entries:
class FeedParser { def readFeed(url) { def xmlFeed = new XmlParser(false, true).parse(url) def feedList = [] if (isAtom(xmlFeed)) { (0..< xmlFeed.entry.size()).each { def entry = xmlFeed.entry.get(it) feedList << new AtomFeedEntry( entry.title.text(), entry.author.text(), entry.link.text(), entry.published.text() ) } } else { (0..< xmlFeed.channel.item.size()).each { def item = xmlFeed.channel.item.get(it) RSSFeedEntry feed = new RSSFeedEntry( item.title.text(), item.link.text(), item.description.text(), item.pubDate.text() ) feedList << feed } } feedList } def isAtom(Node node) { def rootElementName = node.name() if (rootElementName instanceof groovy.xml.QName) { return (rootElementName.localPart == 'feed') && (rootElementName.namespaceURI == 'http://www.w3.org/2005/Atom') } false } } abstract class FeedEntry { } @groovy.transform.Canonical class RSSFeedEntry extends FeedEntry { String title String link String desc String pubDate } @groovy.transform.Canonical class AtomFeedEntry extends FeedEntry { String title String author String link String pubDate }
FeedParser
class can be simply called within the same script file; for example, to print all the titles of the posts from the blog Lambda the Ultimate, which can be found at http://lambda-the-ultimate.org/:def parser = new FeedParser() def feedUrl = 'http://lambda-the-ultimate.org/rss.xml' def feed = parser.readFeed(feedUrl) feed.each { println "${it.title}" }
Dynamic Region Inference Dependently-Typed Metaprogramming (in Agda) ... It's Alive! Continuous Feedback in UI Programming Feed size: 15
The FeedParser
class opens the XML stream (an HTTP URI), and after detecting if the feed is Atom-based or RSS-based, it processes each entry by creating an instance of the FeedEntry
class, which contains headline data.
Another approach to consume feeds using Groovy is using the Rome library. Rome is a venerable framework, which has been around for quite a long time, and it's the default library for consuming and producing RSS or Atom feeds in Java.
@Grab('org.rometools:rome-fetcher:1.2') import org.rometools.fetcher.impl.HttpURLFeedFetcher def feedFetcher = new HttpURLFeedFetcher() def feedUrl = 'http://lambda-the-ultimate.org/rss.xml' def feed = feedFetcher.retrieveFeed(feedUrl.toURL()) feed.entries.each { println "${it.title}" } println "Feed size: ${feed.entries?.size()}"
The code is way more compact than the custom solution we presented in the beginning. Rome handles the differences between feed types (it also supports less common formats).
3.135.187.210