Importing and visualizing data from a JSON API

Now, let's learn how to parse financial data from Quandl's API to create insightful visualizations. Quandl is a financial and economic data warehouse, storing millions of datasets from hundreds of publishers. The best thing about Quandl is that these datasets are delivered via the unified API, without worrying about the procedures to parse the data correctly. Anonymous users can get up to 50 API calls per day, or up to 500 free API calls if registered. Readers can sign up for a free API key at https://www.quandl.com/?modal=register.

At Quandl, every dataset is identified by a unique ID, as defined by the Quandl code on each search result web page. For example, the Quandl code GOOG/NASDAQ_SWTX defines the historical NASDAQ index data published by Google Finance. Every dataset is available in three different formats—CSV, JSON, and XML.

Although an official Python client library is available from Quandl, we are not going to use that, for the sake of demonstrating the general procedures of importing JSON data from an API. According to Quandl's documentation, we can fetch JSON-formatted data tables through the following API call:

GET https://www.quandl.com/api/v3/datasets/{Quandl code}/data.json

To begin with, let's try to get the Big Mac index data from Quandl. The Big Mac index was invented by The Economist in 1986 as a lighthearted guide to whether currencies are at their correct level. It is based on the theory of purchasing power parity (PPP), and is considered an informal measure of currency exchange rates at PPP. It measures their value against a similar basket of goods and services, in this case a Big Mac. Differing prices at market exchange rates would imply that one currency is undervalued or overvalued:

from urllib.request import urlopen
import json
import time
import pandas as pd


def get_bigmac_codes():
"""Get a pandas DataFrame of all codes in the Big Mac index dataset

The first column contains the code, while the second header
contains the description of the code.

E.g.
ECONOMIST/BIGMAC_ARG,Big Mac Index - Argentina
ECONOMIST/BIGMAC_AUS,Big Mac Index - Australia
ECONOMIST/BIGMAC_BRA,Big Mac Index - Brazil

Returns:
codes: pandas DataFrame of Quandl dataset codes"""

codes_url = "https://www.quandl.com/api/v3/databases/ECONOMIST/codes"
codes = pd.read_csv(codes_url, header=None, names=['Code', 'Description'],
compression='zip', encoding='latin_1')

return codes


def get_quandl_dataset(api_key, code):
"""Obtain and parse a quandl dataset in pandas DataFrame format

Quandl returns dataset in JSON format, where data is stored as a
list of lists in response['dataset']['data'], and column headers
stored in response['dataset']['column_names'].

E.g. {'dataset': {...,
'column_names': ['Date',
'local_price',
'dollar_ex',
'dollar_price',
'dollar_ppp',
'dollar_valuation',
'dollar_adj_valuation',
'euro_adj_valuation',
'sterling_adj_valuation',
'yen_adj_valuation',
'yuan_adj_valuation'],
'data': [['2017-01-31',
55.0,
15.8575,
3.4683903515687,
10.869565217391,
-31.454736135007,
6.2671477203176,
8.2697553162259,
29.626894343348,
32.714616745128,
13.625825886047],
['2016-07-31',
50.0,
14.935,
3.3478406427854,
9.9206349206349,
-33.574590420925,
2.0726096168216,
0.40224795003514,
17.56448458418,
19.76377270142,
11.643103380531]
],
'database_code': 'ECONOMIST',
'dataset_code': 'BIGMAC_ARG',
... }}

A custom column--country is added to denote the 3-letter country code.

Args:
api_key: Quandl API key
code: Quandl dataset code

Returns:
df: pandas DataFrame of a Quandl dataset

"""
base_url = "https://www.quandl.com/api/v3/datasets/"
url_suffix = ".json?api_key="

# Fetch the JSON response
u = urlopen(base_url + code + url_suffix + api_key)
response = json.loads(u.read().decode('utf-8'))

# Format the response as pandas Dataframe
df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])

# Label the country code
df['country'] = code[-3:]

return df


quandl_dfs = []
codes = get_bigmac_codes()

# Replace this with your own API key
api_key = "INSERT-YOUR-KEY-HERE"

for code in codes.Code:
# Get the DataFrame of a Quandl dataset
df = get_quandl_dataset(api_key, code)

# Store in a list
quandl_dfs.append(df)

# Prevents exceeding the API speed limit
time.sleep(2)


# Concatenate the list of data frames into a single one
bigmac_df = pd.concat(quandl_dfs)
bigmac_df.head()

Here comes the expected result, which shows the first five rows of the data frame:

  0 1 2 3 4
Date 31-07-17 31-01-17 31-07-16 31-01-16 31-07-15
local_price 5.9 5.8 5.75 5.3 5.3
dollar_ex 1.303016 1.356668 1.335738 1.415729 1.35126
dollar_price 4.527955 4.27518 4.304737 3.743655 3.922265
dollar_ppp 1.113208 1.146245 1.140873 1.075051 1.106472
dollar_valuation -14.56689 -15.510277 -14.588542 -24.06379 -18.115553
dollar_adj_valuation -11.7012 -11.9234 -11.0236 -28.1641 -22.1691
euro_adj_valuation -13.0262 -10.2636 -12.4796 -22.2864 -18.573
sterling_adj_valuation 2.58422 7.43771 2.48065 -22.293 -23.1926
yen_adj_valuation 19.9417 9.99688 4.39776 -4.0042 6.93893
yuan_adj_valuation -2.35772 -5.82434 -2.681 -20.6755 -14.1711
country AUS AUS AUS AUS AUS

 

The code for parsing JSON from Quandl API is a bit complicated, and thus requires extra explanation. The first function, get_bigmac_codes(), parses the list of all available dataset codes in the Quandl Economist database as a pandas DataFrame. Meanwhile, the second function, get_quandl_dataset(api_key, code), converts the JSON response of a Quandl dataset API query to a pandas DataFrame. All datasets obtained are concatenated into a single data frame using pandas.concat().

We should bear in mind that the Big Mac index is not directly comparable between countries. Normally, we would expect commodities in poor countries to be cheaper than those in rich ones. To represent a fairer picture of the index, it would be better to show the relationship between Big Mac pricing and gross domestic product (GDP) per capita.

To that end, we are going to acquire the GDP dataset from Quandl's World Bank World Development Indicators (WWDI) database. Based on the previous code example of acquiring JSON data from Quandl, can you try to adapt it to download the GDP per capita dataset?

For those who are impatient, here is the full code:

import urllib
import json
import pandas as pd
import time
from urllib.request import urlopen

def get_gdp_dataset(api_key, country_code):
"""Obtain and parse a quandl GDP dataset in pandas DataFrame format
Quandl returns dataset in JSON format, where data is stored as a
list of lists in response['dataset']['data'], and column headers
stored in response['dataset']['column_names'].

Args:
api_key: Quandl API key
country_code: Three letter code to represent country

Returns:
df: pandas DataFrame of a Quandl dataset
"""
base_url = "https://www.quandl.com/api/v3/datasets/"
url_suffix = ".json?api_key="

# Compose the Quandl API dataset code to get GDP per capita (constant 2000 US$) dataset
gdp_code = "WWDI/" + country_code + "_NY_GDP_PCAP_KD"

# Parse the JSON response from Quandl API
# Some countries might be missing, so we need error handling code
try:
u = urlopen(base_url + gdp_code + url_suffix + api_key)
except urllib.error.URLError as e:
print(gdp_code,e)
return None

response = json.loads(u.read().decode('utf-8'))

# Format the response as pandas Dataframe
df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])

# Add a new country code column
df['country'] = country_code

return df

api_key = "INSERT-YOUR-KEY-HERE" #Change this to your own API key

quandl_dfs = []

# Loop through all unique country code values in the BigMac index DataFrame
for country_code in bigmac_df.country.unique():
# Fetch the GDP dataset for the corresponding country
df = get_gdp_dataset(api_key, country_code)

# Skip if the response is empty
if df is None:
continue

# Store in a list DataFrames
quandl_dfs.append(df)

# Prevents exceeding the API speed limit
time.sleep(2)


# Concatenate the list of DataFrames into a single one
gdp_df = pd.concat(quandl_dfs)
gdp_df.head()

The GDP data of several geographical regions is missing, but this should be handled gracefully by the try...except code block in the get_gdp_dataset function. This is what you are expecting to see after running the preceding code:

WWDI/EUR_NY_GDP_PCAP_KD HTTP Error 404: Not Found 
WWDI/ROC_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/SIN_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/UAE_NY_GDP_PCAP_KD HTTP Error 404: Not Found
Date Value country
0 2016-12-31 55478.577294 AUS
1 2015-12-31 54800.366396 AUS
2 2014-12-31 54293.794205 AUS
3 2013-12-31 53732.003969 AUS
4 2012-12-31 53315.029915 AUS

 

Next, we will merge the two pandas DataFrames that contain Big Mac Index or GDP per capita using pandas.merge(). The most recent record in WWDI's GDP per capita dataset was collected at the end of 2016, so let's pair that up with the closest Big Mac index dataset in January 2017.

For those who are familiar with the SQL language, pandas.merge() supports four modes, namely left, right, inner, and outer joins. Since we are interested in rows that have matching countries in both pandas DataFrames only, we are going to choose inner join:

merged_df = pd.merge(bigmac_df[(bigmac_df.Date == "2017-01-31")], gdp_df[(gdp_df.Date == "2016-12-31")], how='inner', on='country')
merged_df.head()

Here is the merged data frame:

  0 1 2 3 4
Date_x 31-01-17 31-01-17 31-01-17 31-01-17 31-01-17
local_price 5.8 16.5 3.09 2450 55
dollar_ex 1.356668 3.22395 0.828775 672.805 15.8575
dollar_price 4.27518 5.117945 3.728394 3.641471 3.46839
dollar_ppp 1.146245 3.26087 0.610672 484.189723 10.869565
dollar_valuation -15.510277 1.145166 -26.316324 -28.034167 -31.454736
dollar_adj_valuation -11.9234 67.5509 -18.0208 11.9319 6.26715
euro_adj_valuation -10.2636 70.7084 -16.4759 14.0413 8.26976
sterling_adj_valuation 7.43771 104.382 0 36.5369 29.6269
yen_adj_valuation 9.99688 109.251 2.38201 39.7892 32.7146
yuan_adj_valuation -5.82434 79.1533 -12.3439 19.6828 13.6258
Country AUS BRA GBR CHL ARG
Date_y 31-12-16 31-12-16 31-12-16 31-12-16 31-12-16
Value 55478.5773 10826.2714 41981.3921 15019.633 10153.99791
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.55.39