Really Simple Syndication (RSS) and Atom feeds (refer to http://en.wikipedia.org/wiki/RSS) are often used for blogs and news. These type of feeds follow the publish/subscribe model. For instance, Packt Publishing has an RSS feed with article and book announcements. We can subscribe to the feed to get timely updates. The Python feedparser module allows us to parse RSS and Atom feeds easily without dealing with a lot of technical details. The feedparser module can be installed with pip
as follows:
$ sudo pip install feedparser $ pip freeze|grep feedparser feedparser==5.1.3
After parsing an RSS file, we can access the underlying data using a dotted notation. Parse the Packt Publishing RSS feed and print the number of entries:
import feedparser as fp rss = fp.parse("http://www.packtpub.com/rss.xml") print "# Entries", len(rss.entries)
The number of entries is printed (the number may vary for each program run):
# Entries 50
Print entry titles and summaries if the entry contains the word Python
with the following code:
for i, entry in enumerate(rss.entries): if "Python" in entry.summary: print i, entry.title print entry.summary
On this particular run, the following was printed (if you try it for yourself, you may get something else or nothing at all if the filter is too restrictive):
42 Create interactive plots with matplotlib using Pack't new book and eBook About the author: Alexandre Devert is a scientist. He is an enthusiastic Python coder as well and never gets enough of it! He used to teach data mining, software engineering, and research in numerical optimization. Matplotlib is part of the Scientific Python modules collection. It provides a large library of customizable plots and a comprehensive set of backends. It tries to make easy things easy and make hard things possible. It can help users generate plots, add dimensions to plots, and also make plots interactive with just a few lines of code. Also, matplotlib integrates well with all common GUI modules.
The following code can be found in the rss.py
file of this book's code bundle:
import feedparser as fp rss = fp.parse("http://www.packtpub.com/rss.xml") print "# Entries", len(rss.entries) for i, entry in enumerate(rss.entries): if "Python" in entry.summary: print i, entry.title print entry.summary
3.133.152.198