Consuming RSS and Atom feeds

RSS feeds and Atom feeds are a standardized way to distribute headlines and updates from websites and blogs. Both RSS and Atom feeds are XML documents. RSS is older but widely popular, while Atom is newer and has several advantages over RSS, chiefly the namespace support.

For the main differences between Atom and RSS, check the Wikipedia entry about RSS: http://en.wikipedia.org/wiki/RSS. Both formats are largely supported, and often blogs and sites output headline feeds in RSS and Atom at the same time.

In this recipe, we are going to cover the basics of RSS and Atom feed parsing with Groovy.

Getting ready

As RSS and Atom feeds are XML based, it's easy to parse them using one of the several tools offered by Groovy (see the Reading XML using XmlParser recipe in Chapter 5, Working with XML in Groovy).

We will show how to detect if a feed is RSS or Atom and parse it accordingly.

How to do it...

We will create a FeedParser class which will contain the code to open a URL of a feed (RSS or Atom), parse it, and return a list of the FeedEntry objects populated with the content of the feed entries:

  1. Let's define the classes in a separate script file as follows:
    class FeedParser {
    
      def readFeed(url) {
        def xmlFeed = new XmlParser(false, true).parse(url)
        def feedList = []
        if (isAtom(xmlFeed)) {
          (0..< xmlFeed.entry.size()).each {
            def entry = xmlFeed.entry.get(it)
              feedList << new AtomFeedEntry(
                            entry.title.text(),
                            entry.author.text(),
                            entry.link.text(),
                            entry.published.text()
                          )
          }
        } else {
          (0..< xmlFeed.channel.item.size()).each {
            def item = xmlFeed.channel.item.get(it)
            RSSFeedEntry feed = new RSSFeedEntry(
                                  item.title.text(),
                                  item.link.text(),
                                  item.description.text(),
                                  item.pubDate.text()
                                )
            feedList << feed
          }
        }
        feedList
      }
    
      def isAtom(Node node) {
        def rootElementName = node.name()
        if (rootElementName instanceof groovy.xml.QName) {
          return (rootElementName.localPart == 'feed') &&
                 (rootElementName.namespaceURI ==
                   'http://www.w3.org/2005/Atom')
        }
        false
      }
    
    }
    
    abstract class FeedEntry {
    
    }
    
    @groovy.transform.Canonical
    class RSSFeedEntry extends FeedEntry {
      String title
      String link
      String desc
      String pubDate
    }
    
    @groovy.transform.Canonical
    class AtomFeedEntry extends FeedEntry {
      String title
      String author
      String link
      String pubDate
    }
  2. The FeedParser class can be simply called within the same script file; for example, to print all the titles of the posts from the blog Lambda the Ultimate, which can be found at http://lambda-the-ultimate.org/:
    def parser = new FeedParser()
    def feedUrl = 'http://lambda-the-ultimate.org/rss.xml'
    def feed = parser.readFeed(feedUrl)
    
    feed.each {
      println "${it.title}"
    }
  3. After execution, the code should print a list of latest titles as shown in the following output:
    Dynamic Region Inference
    Dependently-Typed Metaprogramming (in Agda)
    ...
    It's Alive! Continuous Feedback in UI Programming
    Feed size: 15
    

How it works...

The FeedParser class opens the XML stream (an HTTP URI), and after detecting if the feed is Atom-based or RSS-based, it processes each entry by creating an instance of the FeedEntry class, which contains headline data.

There's more...

Another approach to consume feeds using Groovy is using the Rome library. Rome is a venerable framework, which has been around for quite a long time, and it's the default library for consuming and producing RSS or Atom feeds in Java.

@Grab('org.rometools:rome-fetcher:1.2')
import org.rometools.fetcher.impl.HttpURLFeedFetcher

def feedFetcher = new HttpURLFeedFetcher()
def feedUrl = 'http://lambda-the-ultimate.org/rss.xml'
def feed = feedFetcher.retrieveFeed(feedUrl.toURL())

feed.entries.each {
   println "${it.title}"
}

println "Feed size: ${feed.entries?.size()}"

The code is way more compact than the custom solution we presented in the beginning. Rome handles the differences between feed types (it also supports less common formats).

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.187.210