Parsing RSS and Atom feeds

Really Simple Syndication (RSS) and Atom feeds (refer to http://en.wikipedia.org/wiki/RSS) are often used for blogs and news. These type of feeds follow the publish/subscribe model. For instance, Packt Publishing has an RSS feed with article and book announcements. We can subscribe to the feed to get timely updates. The Python feedparser module allows us to parse RSS and Atom feeds easily without dealing with a lot of technical details. The feedparser module can be installed with pip as follows:

$ sudo pip install feedparser
$ pip freeze|grep feedparser
feedparser==5.1.3

After parsing an RSS file, we can access the underlying data using a dotted notation. Parse the Packt Publishing RSS feed and print the number of entries:

import feedparser as fp

rss = fp.parse("http://www.packtpub.com/rss.xml")

print "# Entries", len(rss.entries)

The number of entries is printed (the number may vary for each program run):

# Entries 50

Print entry titles and summaries if the entry contains the word Python with the following code:

for i, entry in enumerate(rss.entries):
   if "Python" in entry.summary:
      print i, entry.title
      print entry.summary

On this particular run, the following was printed (if you try it for yourself, you may get something else or nothing at all if the filter is too restrictive):

42 Create interactive plots with matplotlib using Pack't new book and eBook
About the author: Alexandre Devert is a scientist. He is an enthusiastic Python coder as well and never gets enough of it! He used to teach data mining, software engineering, and research in numerical optimization.
Matplotlib is part of the Scientific Python modules collection. It provides a large library of customizable plots and a comprehensive set of backends. It tries to make easy things easy and make hard things possible. It can help users generate plots, add dimensions to plots, and also make plots interactive with just a few lines of code. Also, matplotlib integrates well with all common GUI modules.

The following code can be found in the rss.py file of this book's code bundle:

import feedparser as fp

rss = fp.parse("http://www.packtpub.com/rss.xml")

print "# Entries", len(rss.entries)

for i, entry in enumerate(rss.entries):
   if "Python" in entry.summary:
      print i, entry.title
      print entry.summary
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.152.198