Chapter 3. APIs in Action

When we talk about APIs in relation to Python, we usually refer to the classes and the functions that a module presents to us to interact with. In this chapter, we'll be talking about something different, that is, web APIs.

A web API is a type of API that you interact with through the HTTP protocol. Nowadays, many web services provide a set of HTTP calls, which are designed to be used programmatically by clients, that is, they are meant to be used by machines rather than by humans. Through these interfaces it's possible to automate interaction with the services and to perform tasks such as extracting data, configuring the service in some way, and uploading your own content into the service.

In this chapter, we'll look at:

  • Two popular data exchange formats used by web APIs: XML and JSON
  • How to interact with two major web APIs: Amazon S3 and Twitter
  • How to pull data from HTML pages when an API is not available
  • How to make life easier for the webmasters that provide these APIs and websites

There are hundreds of services that offer web APIs. A quite comprehensive and ever-growing list of these services can be found at http://www.programmableweb.com.

We're going to start by introducing how XML is used in Python, and then we will explain an XML-based API called the Amazon S3 API.

Getting started with XML

The Extensible Markup Language (XML) is a way of representing hierarchical data in a standard text format. When working with XML-based web APIs, we'll be creating XML documents and sending them as the bodies of HTTP requests and receiving XML documents as the bodies of responses.

Here's the text representation of an XML document, perhaps this represents the stock at a cheese shop:

<?xml version='1.0'?>
<inventory>
    <cheese id="c01">
        <name>Caerphilly</name>
        <stock>0</stock>
    </cheese>
    <cheese id="c02">
        <name>Illchester</name>
        <stock>0</stock>
    </cheese>
</inventory>

If you've coded with HTML before, then this may look familiar. XML is a markup based format. It is from the same family of languages as HTML. The data is structured in an hierarchy formed by elements. Each element is represented by two tags, a start tag, for example, <name>, and a matching end tag, for example, </name>. Between these two tags, we can either put data, such as Caerphilly, or add more tags, which represent child elements.

Unlike HTML, XML is designed such that we can define our own tags and create our own data formats. Also, unlike HTML, the XML syntax is always strictly enforced. Whereas in HTML small mistakes, such as tags being closed in the wrong order, closing tags missing altogether, or attribute values missing quotes are tolerated, in XML, these mistakes will result in completely unreadable XML documents. A correctly formatted XML document is called well formed.

The XML APIs

There are two main approaches to working with XML data:

  • Reading in a whole document and creating an object-based representation of it, then manipulating it by using an object-oriented API
  • Processing the document from start to end, and performing actions as specific tags are encountered

For now, we're going to focus on the object-based approach by using a Python XML API called ElementTree. The second so-called pull or event-based approach (also often called SAX, as SAX is one of the most popular APIs in this category) is more complicated to set up, and is only needed for processing large XML files. We won't need this to work with Amazon S3.

The basics of ElementTree

We'll be using the Python standard library implementation of the ElementTree API, which is in the xml.etree.ElementTree module.

Let's see how we may create the aforementioned example XML document by using ElementTree. Open a Python interpreter and run the following commands:

>>> import xml.etree.ElementTree as ET
>>> root = ET.Element('inventory')
>>> ET.dump(root)
<inventory />

We start by creating the root element, that is, the outermost element of the document. We create a root element <inventory> here, and then print its string representation to screen. The <inventory /> representation is an XML shortcut for <inventory></inventory>. It's used to show an empty element, that is, an element with no data and no child tags.

We create the <inventory> element by creating a new ElementTree.Element object. You'll notice that the argument we give to Element() is the name of the tag that is created.

Our <inventory> element is empty at the moment, so let's put something in it. Do this:

>>> cheese = ET.Element('cheese')
>>> root.append(cheese)
>>> ET.dump(root)
<inventory><cheese /></inventory>

Now, we have an element called <cheese> in our <inventory> element. When an element is directly nested inside another, then the nested element is called a child of the outer element, and the outer element is called the parent. Similarly, elements that are at the same level are called siblings.

Let's add another element, and this time let's give it some content. Add the following commands:

>>> name = ET.SubElement(cheese, 'name')
>>> name.text = 'Caerphilly'
>>> ET.dump(root)
<inventory><cheese><name>Caerphilly</name></cheese></inventory>

Now, our document is starting to shape up. We do two new things here: first, we use the shortcut class method ElementTree.SubElement() to create the new <name> element and insert it into the tree as a child of <cheese> in a single operation. Second, we give it some content by assigning some text to the element's text attribute.

We can remove elements by using the remove() method on the parent element, as shown in the following commands:

>>> temp = ET.SubElement(root, 'temp')
>>> ET.dump(root)
<inventory><cheese><name>Caerphilly</name></cheese><temp /></inventory>
>>> root.remove(temp)
>>> ET.dump(root)
<inventory><cheese><name>Caerphilly</name></cheese></inventory>

Pretty printing

It would be useful for us to be able to produce output in a more legible format, such as the example shown at the beginning of this section. The ElementTree API doesn't have a function for doing this, but another XML API, minidom, provided by the standard library, does, and it's simple to use. First, import minidom:

>>> import xml.dom.minidom as minidom

Second, use the following command to print some nicely formatted XML:

>>> print(minidom.parseString(ET.tostring(root)).toprettyxml())
<?xml version="1.0" ?>
<inventory>
    <cheese>
      <name>Caerphilly</name>
    </cheese>
</inventory>

These are not the easiest lines of code at first glance, so let's break them down. The minidom library can't directly work with ElementTree elements, so we use ElementTree's tostring() function to create a string representation of our XML. We load the string into the minidom API by using minidom.parseString(), and then we use the toprettyxml() method to output our formatted XML.

This can be wrapped into a function so that it becomes more handy. Enter the command block as shown in the following into your Python shell:

>>> def xml_pprint(element):
...     s = ET.tostring(element)
...     print(minidom.parseString(s).toprettyxml())

Now, just do the following to pretty print:

>>> xml_pprint(root)
<?xml version="1.0" ?>
<inventory>
    <cheese>
...

Element attributes

In the example shown at the beginning of this section, you may have spotted something in the opening tag of the <cheese> element, that is, the id="c01" text. This is called an attribute. We can use attributes to attach extra information to elements, and there's no limit to the number of attributes an element can have. Attributes are always comprised of an attribute name, which in this case is id, and a value, which in this case is c01. The values can be any text, but they must be enclosed in quotes.

Now, add the id attribute to the <cheese> element, as shown here:

>>> cheese.attrib['id'] = 'c01'
>>> xml_pprint(cheese)
<?xml version="1.0" ?>
<cheese id="c01">
    <name>Caerphilly</name>
</cheese>

The attrib attribute of an element is a dict-like object which holds an element's attribute names and values. We can manipulate the XML attributes as we would a regular dict.

By now, you should be able to fully recreate the example document shown at the beginning of this section. Go ahead and give it a try.

Converting to text

Once we have an XML tree that we're happy with, usually we would want to convert it into a string to send it over the network. The ET.dump() function that we've been using isn't appropriate for this. All the dump() function does is print the tag to the screen. It doesn't return a string which we can use. We need to use the ET.tostring() function for this, as shown in the following commands:

>>> text = ET.tostring(name)
>>> print(text)
b'<name>Caerphilly</name>'

Notice that it returns a bytes object. It encods our string for us. The default character set is us-ascii but it's better to use UTF-8 for transmitting over HTTP, since it can encode the full range of Unicode characters, and it is widely supported by web applications.

>>> text = ET.tostring(name, encoding='utf-8')

For now, this is all that we need to know about creating XML documents, so let's see how we can apply it to a web API.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.156