Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Consuming RSS and Atom feeds

RSS feeds and Atom feeds are a standardized way to distribute headlines and updates from websites and blogs. Both RSS and Atom feeds are XML documents. RSS is older but widely popular, while Atom is newer and has several advantages over RSS, chiefly the namespace support.

For the main differences between Atom and RSS, check the Wikipedia entry about RSS: http://en.wikipedia.org/wiki/RSS. Both formats are largely supported, and often blogs and sites output headline feeds in RSS and Atom at the same time.

In this recipe, we are going to cover the basics of RSS and Atom feed parsing with Groovy.

Getting ready

As RSS and Atom feeds are XML based, it's easy to parse them using one of the several tools offered by Groovy (see the Reading XML using XmlParser recipe in Chapter 5, Working with XML in Groovy).

We will show how to detect if a feed is RSS or Atom and parse it accordingly.

How to do it...

We will create a FeedParser class which will contain the code to open a URL of a feed (RSS or Atom), parse it, and return a list of the FeedEntry objects populated with the content of the feed entries:

Let's define the classes in a separate script file as follows:

class FeedParser {

  def readFeed(url) {
    def xmlFeed = new XmlParser(false, true).parse(url)
    def feedList = []
    if (isAtom(xmlFeed)) {
      (0..< xmlFeed.entry.size()).each {
        def entry = xmlFeed.entry.get(it)
          feedList << new AtomFeedEntry(
                        entry.title.text(),
                        entry.author.text(),
                        entry.link.text(),
                        entry.published.text()
                      )
      }
    } else {
      (0..< xmlFeed.channel.item.size()).each {
        def item = xmlFeed.channel.item.get(it)
        RSSFeedEntry feed = new RSSFeedEntry(
                              item.title.text(),
                              item.link.text(),
                              item.description.text(),
                              item.pubDate.text()
                            )
        feedList << feed
      }
    }
    feedList
  }

  def isAtom(Node node) {
    def rootElementName = node.name()
    if (rootElementName instanceof groovy.xml.QName) {
      return (rootElementName.localPart == 'feed') &&
             (rootElementName.namespaceURI ==
               'http://www.w3.org/2005/Atom')
    }
    false
  }

}

abstract class FeedEntry {

}

@groovy.transform.Canonical
class RSSFeedEntry extends FeedEntry {
  String title
  String link
  String desc
  String pubDate
}

@groovy.transform.Canonical
class AtomFeedEntry extends FeedEntry {
  String title
  String author
  String link
  String pubDate
}

The FeedParser class can be simply called within the same script file; for example, to print all the titles of the posts from the blog Lambda the Ultimate, which can be found at http://lambda-the-ultimate.org/:
```
def parser = new FeedParser()
def feedUrl = 'http://lambda-the-ultimate.org/rss.xml'
def feed = parser.readFeed(feedUrl)

feed.each {
  println "${it.title}"
}
```

After execution, the code should print a list of latest titles as shown in the following output:

Dynamic Region Inference
Dependently-Typed Metaprogramming (in Agda)
...
It's Alive! Continuous Feedback in UI Programming
Feed size: 15

How it works...

The FeedParser class opens the XML stream (an HTTP URI), and after detecting if the feed is Atom-based or RSS-based, it processes each entry by creating an instance of the FeedEntry class, which contains headline data.

There's more...

Another approach to consume feeds using Groovy is using the Rome library. Rome is a venerable framework, which has been around for quite a long time, and it's the default library for consuming and producing RSS or Atom feeds in Java.

@Grab('org.rometools:rome-fetcher:1.2')
import org.rometools.fetcher.impl.HttpURLFeedFetcher

def feedFetcher = new HttpURLFeedFetcher()
def feedUrl = 'http://lambda-the-ultimate.org/rss.xml'
def feed = feedFetcher.retrieveFeed(feedUrl.toURL())

feed.entries.each {
   println "${it.title}"
}

println "Feed size: ${feed.entries?.size()}"

The code is way more compact than the custom solution we presented in the beginning. Rome handles the differences between feed types (it also supports less common formats).

Table of Contents for
Consuming RSS and Atom feeds

Consuming RSS and Atom feeds

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for Consuming RSS and Atom feeds

Create new playlist

Sign In

Sign Up

Consuming RSS and Atom feeds

Getting ready

How to do it...

How it works...

There's more...

See also

Table of Contents for
Consuming RSS and Atom feeds