Importing data from a JSON resource

This recipe will show us how we can read the JSON data format. Moreover, we'll be using a remote resource in this recipe. It will add a tiny level of complexity to the recipe, but it will also make it much more useful because in real life we will encounter more remote resources than local ones.

JavaScript Object Notation (JSON) is widely used as a platform-independent format to exchange data between systems or applications.

A resource, in this context, is anything we can read, be it a file or a URL endpoint (which can be the output of a remote process/program or just a remote static file). In short, we don't care who produced a resource and how they did it; we just need it to be in a known format like JSON.

Getting ready

In order to get started with this recipe, we need the requests module installed and importable (in PYTHONPATH) in our virtual environment. We have installed this module in Chapter 1, Preparing Your Working Environment.

We also need Internet connectivity as we'll be reading a remote resource.

How to do it...

The following code sample performs reading and parsing of the recent activities' timeline from the GitHub (http://github.com) site. We will perform the following steps for this:

  1. Define the GitHub URL of a JSON file with the details of a GitHub profile.
  2. Get the contents from the URL using the requests module.
  3. Read the content as JSON.

Here is the code for this:

   import requests
   from pprint import pprint
    url = 'https://api.github.com/users/justglowing'
    r = requests.get(url)
    json_obj = r.json()pprint(json_obj)

How it works...

First, we use the "requests" module to fetch a remote resource. This is very straightforward as the "requests" module offers a simple API to define HTTP verbs, so we just need to issue one get() method call. This method retrieves data and request metadata and wraps it in the "Response" object, so we can inspect it. For this recipe, we are only interested in the Response.json() method, which automatically reads content (available at Response.content) and parses it as JSON and loads it into the JSON object.

Now that we have the JSON object, we can process the data. In order to do that, we need to understand what data looks like. We can achieve that understanding by opening the JSON resource using our favorite web browser or command-line tool such as wget or curl.

Another way is to fetch data from IPython and inspect it interactively. We can achieve that by running our program from IPython (using %run program_name.py). After execution, we are left with all variables that the program produced. List them all using %who or %whos.

Whatever method we use, we gain knowledge about the structure of the JSON data and the ability to see what parts of that structure we are interested in.

The JSON object is basically just a Python dictionary (or if stated in a more complex manner, a dictionary of dictionaries) and we can access parts of it using a well-known, key-based notation. In our example, the .json file contains the details of a GitHub profile and we can access the location of the user referencing json_obj['location']. If we compare the structure of the dictionary json_obj with that of the .json file, we see that each entry in the .json file corresponds to a key in the dictionary. This means that the entire content of the .json file is now into the dictionary (keep in mind that when you load a .json file, the order of the keys is not preserved!).

There's more...

The JSON format (specified by RFC 4627; refer to http://tools.ietf.org/html/rfc4627.html) became very popular recently as it is more human readable than XML and is also less verbose. Hence, it's lighter in terms of the syntaxes required to transfer data. It is very popular in the web application domain as it is native to JavaScript, the language used for most of today's rich Internet applications.

The Python JSON module has more capabilities than we have displayed here; for example, we could specialize the basic JSONEncoder/JSONDecoder class to transform our Python data into JSON format. The classical example uses this approach to JSON-ify the Python built-in type for complex numbers.

For simple customization, we don't have to subclass the JSONDecoder/JSONEncoder class as some of the parameters can solve our problems.

For example, json.loads() will parse a float as the Python type float, and most of the time it will be right. Sometimes, however, the float value in the .json file represents a price value, and this is better represented as a decimal. We can instruct the json parser to parse floats as decimal. For example, we have the following JSON string:

jstring = '{"name":"prod1","price":12.50}'

This is followed by these two lines of code:

from decimal import Decimal
json.loads(jstring, parse_float=Decimal)

The preceding two lines of code will generate this output:

{u'name': u'prod1', u'price': Decimal('12.50')}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.165.62