Importing and visualizing data from a JSON API

Now, let's learn how to parse financial data from Quandl's API to create insightful visualizations. Quandl is a financial and economic data warehouse, storing millions of datasets from hundreds of publishers. The best thing about Quandl is that these datasets are delivered via the unified API, without worrying about the procedures to parse the data correctly. Anonymous users can get up to 50 API calls per day, or up to 500 free API calls if registered. Readers can sign up for a free API key at https://www.quandl.com/?modal=register.

At Quandl, every dataset is identified by a unique ID, as defined by the Quandl code on each search result web page. For example, the Quandl code GOOG/NASDAQ_SWTX defines the historical NASDAQ index data published by Google Finance. Every dataset is available in three different formats—CSV, JSON, and XML.

Although an official Python client library is available from Quandl, we are not going to use that, for the sake of demonstrating the general procedures of importing JSON data from an API. According to Quandl's documentation, we can fetch JSON-formatted data tables through the following API call:

GET https://www.quandl.com/api/v3/datasets/{Quandl code}/data.json

To begin with, let's try to get the Big Mac index data from Quandl. The Big Mac index was invented by The Economist in 1986 as a lighthearted guide to whether currencies are at their correct level. It is based on the theory of purchasing power parity (PPP), and is considered an informal measure of currency exchange rates at PPP. It measures their value against a similar basket of goods and services, in this case a Big Mac. Differing prices at market exchange rates would imply that one currency is undervalued or overvalued:

from urllib.request import urlopen
import json
import time
import pandas as pd


def get_bigmac_codes():
    """Get a pandas DataFrame of all codes in the Big Mac index dataset

    The first column contains the code, while the second header
    contains the description of the code.
    
    E.g. 
    ECONOMIST/BIGMAC_ARG,Big Mac Index - Argentina
    ECONOMIST/BIGMAC_AUS,Big Mac Index - Australia
    ECONOMIST/BIGMAC_BRA,Big Mac Index - Brazil
    
    Returns:
        codes: pandas DataFrame of Quandl dataset codes"""
    
    codes_url = "https://www.quandl.com/api/v3/databases/ECONOMIST/codes"
    codes = pd.read_csv(codes_url, header=None, names=['Code', 'Description'], 
                        compression='zip', encoding='latin_1')
    
    return codes

    
def get_quandl_dataset(api_key, code):
    """Obtain and parse a quandl dataset in pandas DataFrame format

    Quandl returns dataset in JSON format, where data is stored as a 
    list of lists in response['dataset']['data'], and column headers
    stored in response['dataset']['column_names'].
    
    E.g. {'dataset': {...,
             'column_names': ['Date',
                              'local_price',
                              'dollar_ex',
                              'dollar_price',
                              'dollar_ppp',
                              'dollar_valuation',
                              'dollar_adj_valuation',
                              'euro_adj_valuation',
                              'sterling_adj_valuation',
                              'yen_adj_valuation',
                              'yuan_adj_valuation'],
             'data': [['2017-01-31',
                       55.0,
                       15.8575,
                       3.4683903515687,
                       10.869565217391,
                       -31.454736135007,
                       6.2671477203176,
                       8.2697553162259,
                       29.626894343348,
                       32.714616745128,
                       13.625825886047],
                      ['2016-07-31',
                       50.0,
                       14.935,
                       3.3478406427854,
                       9.9206349206349,
                       -33.574590420925,
                       2.0726096168216,
                       0.40224795003514,
                       17.56448458418,
                       19.76377270142,
                       11.643103380531]
                      ],
             'database_code': 'ECONOMIST',
             'dataset_code': 'BIGMAC_ARG',
             ... }}
    
    A custom column--country is added to denote the 3-letter country code.
    
    Args:
        api_key: Quandl API key
        code: Quandl dataset code

    Returns:
        df: pandas DataFrame of a Quandl dataset

    """
    base_url = "https://www.quandl.com/api/v3/datasets/"
    url_suffix = ".json?api_key="

    # Fetch the JSON response 
    u = urlopen(base_url + code + url_suffix + api_key)
    response = json.loads(u.read().decode('utf-8'))
    
    # Format the response as pandas Dataframe
    df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])
    
    # Label the country code
    df['country'] = code[-3:]
    
    return df


quandl_dfs = []
codes = get_bigmac_codes()

# Replace this with your own API key
api_key = "INSERT-YOUR-KEY-HERE" 

for code in codes.Code:
    # Get the DataFrame of a Quandl dataset
    df = get_quandl_dataset(api_key, code)
    
    # Store in a list
    quandl_dfs.append(df)
    
    # Prevents exceeding the API speed limit
    time.sleep(2)
    
    
# Concatenate the list of data frames into a single one 
bigmac_df = pd.concat(quandl_dfs)
bigmac_df.head()

Here comes the expected result, which shows the first five rows of the data frame:

	0	1	2	3	4
Date	31-07-17	31-01-17	31-07-16	31-01-16	31-07-15
local_price	5.9	5.8	5.75	5.3	5.3
dollar_ex	1.303016	1.356668	1.335738	1.415729	1.35126
dollar_price	4.527955	4.27518	4.304737	3.743655	3.922265
dollar_ppp	1.113208	1.146245	1.140873	1.075051	1.106472
dollar_valuation	-14.56689	-15.510277	-14.588542	-24.06379	-18.115553
dollar_adj_valuation	-11.7012	-11.9234	-11.0236	-28.1641	-22.1691
euro_adj_valuation	-13.0262	-10.2636	-12.4796	-22.2864	-18.573
sterling_adj_valuation	2.58422	7.43771	2.48065	-22.293	-23.1926
yen_adj_valuation	19.9417	9.99688	4.39776	-4.0042	6.93893
yuan_adj_valuation	-2.35772	-5.82434	-2.681	-20.6755	-14.1711
country	AUS	AUS	AUS	AUS	AUS

The code for parsing JSON from Quandl API is a bit complicated, and thus requires extra explanation. The first function, get_bigmac_codes(), parses the list of all available dataset codes in the Quandl Economist database as a pandas DataFrame. Meanwhile, the second function, get_quandl_dataset(api_key, code), converts the JSON response of a Quandl dataset API query to a pandas DataFrame. All datasets obtained are concatenated into a single data frame using pandas.concat().

We should bear in mind that the Big Mac index is not directly comparable between countries. Normally, we would expect commodities in poor countries to be cheaper than those in rich ones. To represent a fairer picture of the index, it would be better to show the relationship between Big Mac pricing and gross domestic product (GDP) per capita.

To that end, we are going to acquire the GDP dataset from Quandl's World Bank World Development Indicators (WWDI) database. Based on the previous code example of acquiring JSON data from Quandl, can you try to adapt it to download the GDP per capita dataset?

For those who are impatient, here is the full code:

import urllib
import json
import pandas as pd
import time
from urllib.request import urlopen

def get_gdp_dataset(api_key, country_code):
    """Obtain and parse a quandl GDP dataset in pandas DataFrame format
    Quandl returns dataset in JSON format, where data is stored as a 
    list of lists in response['dataset']['data'], and column headers
    stored in response['dataset']['column_names'].
 
    Args:
        api_key: Quandl API key
        country_code: Three letter code to represent country

    Returns:
        df: pandas DataFrame of a Quandl dataset
    """
    base_url = "https://www.quandl.com/api/v3/datasets/"
    url_suffix = ".json?api_key="
 
    # Compose the Quandl API dataset code to get GDP per capita (constant 2000 US$) dataset
    gdp_code = "WWDI/" + country_code + "_NY_GDP_PCAP_KD"
 
   # Parse the JSON response from Quandl API
   # Some countries might be missing, so we need error handling code
   try:
       u = urlopen(base_url + gdp_code + url_suffix + api_key)
   except urllib.error.URLError as e:
       print(gdp_code,e)
       return None
 
   response = json.loads(u.read().decode('utf-8'))
 
   # Format the response as pandas Dataframe
   df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])
 
   # Add a new country code column
   df['country'] = country_code
 
   return df

api_key = "INSERT-YOUR-KEY-HERE" #Change this to your own API key

quandl_dfs = []

# Loop through all unique country code values in the BigMac index DataFrame
for country_code in bigmac_df.country.unique():
    # Fetch the GDP dataset for the corresponding country 
    df = get_gdp_dataset(api_key, country_code)
    
    # Skip if the response is empty
    if df is None:
        continue
        
    # Store in a list DataFrames
    quandl_dfs.append(df)
    
    # Prevents exceeding the API speed limit
    time.sleep(2)
    
    
# Concatenate the list of DataFrames into a single one 
gdp_df = pd.concat(quandl_dfs)
gdp_df.head()

The GDP data of several geographical regions is missing, but this should be handled gracefully by the try...except code block in the get_gdp_dataset function. This is what you are expecting to see after running the preceding code:

WWDI/EUR_NY_GDP_PCAP_KD HTTP Error 404: Not Found 
WWDI/ROC_NY_GDP_PCAP_KD HTTP Error 404: Not Found 
WWDI/SIN_NY_GDP_PCAP_KD HTTP Error 404: Not Found 
WWDI/UAE_NY_GDP_PCAP_KD HTTP Error 404: Not Found

	Date	Value	country
0	2016-12-31	55478.577294	AUS
1	2015-12-31	54800.366396	AUS
2	2014-12-31	54293.794205	AUS
3	2013-12-31	53732.003969	AUS
4	2012-12-31	53315.029915	AUS

Next, we will merge the two pandas DataFrames that contain Big Mac Index or GDP per capita using pandas.merge(). The most recent record in WWDI's GDP per capita dataset was collected at the end of 2016, so let's pair that up with the closest Big Mac index dataset in January 2017.

For those who are familiar with the SQL language, pandas.merge() supports four modes, namely left, right, inner, and outer joins. Since we are interested in rows that have matching countries in both pandas DataFrames only, we are going to choose inner join:

merged_df = pd.merge(bigmac_df[(bigmac_df.Date == "2017-01-31")], gdp_df[(gdp_df.Date == "2016-12-31")], how='inner', on='country')
merged_df.head()

Here is the merged data frame:

	0	1	2	3	4
Date_x	31-01-17	31-01-17	31-01-17	31-01-17	31-01-17
local_price	5.8	16.5	3.09	2450	55
dollar_ex	1.356668	3.22395	0.828775	672.805	15.8575
dollar_price	4.27518	5.117945	3.728394	3.641471	3.46839
dollar_ppp	1.146245	3.26087	0.610672	484.189723	10.869565
dollar_valuation	-15.510277	1.145166	-26.316324	-28.034167	-31.454736
dollar_adj_valuation	-11.9234	67.5509	-18.0208	11.9319	6.26715
euro_adj_valuation	-10.2636	70.7084	-16.4759	14.0413	8.26976
sterling_adj_valuation	7.43771	104.382	0	36.5369	29.6269
yen_adj_valuation	9.99688	109.251	2.38201	39.7892	32.7146
yuan_adj_valuation	-5.82434	79.1533	-12.3439	19.6828	13.6258
Country	AUS	BRA	GBR	CHL	ARG
Date_y	31-12-16	31-12-16	31-12-16	31-12-16	31-12-16
Value	55478.5773	10826.2714	41981.3921	15019.633	10153.99791

Table of Contents for Importing and visualizing data from a JSON API

Create new playlist

Sign In

Sign Up

Table of Contents for
Importing and visualizing data from a JSON API