Now, let's learn how to parse financial data from Quandl's API to create insightful visualizations. Quandl is a financial and economic data warehouse, storing millions of datasets from hundreds of publishers. The best thing about Quandl is that these datasets are delivered via the unified API, without worrying about the procedures to parse the data correctly. Anonymous users can get up to 50 API calls per day, or up to 500 free API calls if registered. Readers can sign up for a free API key at https://www.quandl.com/?modal=register.
At Quandl, every dataset is identified by a unique ID, as defined by the Quandl code on each search result web page. For example, the Quandl code GOOG/NASDAQ_SWTX defines the historical NASDAQ index data published by Google Finance. Every dataset is available in three different formats—CSV, JSON, and XML.
Although an official Python client library is available from Quandl, we are not going to use that, for the sake of demonstrating the general procedures of importing JSON data from an API. According to Quandl's documentation, we can fetch JSON-formatted data tables through the following API call:
GET https://www.quandl.com/api/v3/datasets/{Quandl code}/data.json
To begin with, let's try to get the Big Mac index data from Quandl. The Big Mac index was invented by The Economist in 1986 as a lighthearted guide to whether currencies are at their correct level. It is based on the theory of purchasing power parity (PPP), and is considered an informal measure of currency exchange rates at PPP. It measures their value against a similar basket of goods and services, in this case a Big Mac. Differing prices at market exchange rates would imply that one currency is undervalued or overvalued:
from urllib.request import urlopen
import json
import time
import pandas as pd
def get_bigmac_codes():
"""Get a pandas DataFrame of all codes in the Big Mac index dataset
The first column contains the code, while the second header
contains the description of the code.
E.g.
ECONOMIST/BIGMAC_ARG,Big Mac Index - Argentina
ECONOMIST/BIGMAC_AUS,Big Mac Index - Australia
ECONOMIST/BIGMAC_BRA,Big Mac Index - Brazil
Returns:
codes: pandas DataFrame of Quandl dataset codes"""
codes_url = "https://www.quandl.com/api/v3/databases/ECONOMIST/codes"
codes = pd.read_csv(codes_url, header=None, names=['Code', 'Description'],
compression='zip', encoding='latin_1')
return codes
def get_quandl_dataset(api_key, code):
"""Obtain and parse a quandl dataset in pandas DataFrame format
Quandl returns dataset in JSON format, where data is stored as a
list of lists in response['dataset']['data'], and column headers
stored in response['dataset']['column_names'].
E.g. {'dataset': {...,
'column_names': ['Date',
'local_price',
'dollar_ex',
'dollar_price',
'dollar_ppp',
'dollar_valuation',
'dollar_adj_valuation',
'euro_adj_valuation',
'sterling_adj_valuation',
'yen_adj_valuation',
'yuan_adj_valuation'],
'data': [['2017-01-31',
55.0,
15.8575,
3.4683903515687,
10.869565217391,
-31.454736135007,
6.2671477203176,
8.2697553162259,
29.626894343348,
32.714616745128,
13.625825886047],
['2016-07-31',
50.0,
14.935,
3.3478406427854,
9.9206349206349,
-33.574590420925,
2.0726096168216,
0.40224795003514,
17.56448458418,
19.76377270142,
11.643103380531]
],
'database_code': 'ECONOMIST',
'dataset_code': 'BIGMAC_ARG',
... }}
A custom column--country is added to denote the 3-letter country code.
Args:
api_key: Quandl API key
code: Quandl dataset code
Returns:
df: pandas DataFrame of a Quandl dataset
"""
base_url = "https://www.quandl.com/api/v3/datasets/"
url_suffix = ".json?api_key="
# Fetch the JSON response
u = urlopen(base_url + code + url_suffix + api_key)
response = json.loads(u.read().decode('utf-8'))
# Format the response as pandas Dataframe
df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])
# Label the country code
df['country'] = code[-3:]
return df
quandl_dfs = []
codes = get_bigmac_codes()
# Replace this with your own API key
api_key = "INSERT-YOUR-KEY-HERE"
for code in codes.Code:
# Get the DataFrame of a Quandl dataset
df = get_quandl_dataset(api_key, code)
# Store in a list
quandl_dfs.append(df)
# Prevents exceeding the API speed limit
time.sleep(2)
# Concatenate the list of data frames into a single one
bigmac_df = pd.concat(quandl_dfs)
bigmac_df.head()
Here comes the expected result, which shows the first five rows of the data frame:
0 | 1 | 2 | 3 | 4 | |
Date | 31-07-17 | 31-01-17 | 31-07-16 | 31-01-16 | 31-07-15 |
local_price | 5.9 | 5.8 | 5.75 | 5.3 | 5.3 |
dollar_ex | 1.303016 | 1.356668 | 1.335738 | 1.415729 | 1.35126 |
dollar_price | 4.527955 | 4.27518 | 4.304737 | 3.743655 | 3.922265 |
dollar_ppp | 1.113208 | 1.146245 | 1.140873 | 1.075051 | 1.106472 |
dollar_valuation | -14.56689 | -15.510277 | -14.588542 | -24.06379 | -18.115553 |
dollar_adj_valuation | -11.7012 | -11.9234 | -11.0236 | -28.1641 | -22.1691 |
euro_adj_valuation | -13.0262 | -10.2636 | -12.4796 | -22.2864 | -18.573 |
sterling_adj_valuation | 2.58422 | 7.43771 | 2.48065 | -22.293 | -23.1926 |
yen_adj_valuation | 19.9417 | 9.99688 | 4.39776 | -4.0042 | 6.93893 |
yuan_adj_valuation | -2.35772 | -5.82434 | -2.681 | -20.6755 | -14.1711 |
country | AUS | AUS | AUS | AUS | AUS |
The code for parsing JSON from Quandl API is a bit complicated, and thus requires extra explanation. The first function, get_bigmac_codes(), parses the list of all available dataset codes in the Quandl Economist database as a pandas DataFrame. Meanwhile, the second function, get_quandl_dataset(api_key, code), converts the JSON response of a Quandl dataset API query to a pandas DataFrame. All datasets obtained are concatenated into a single data frame using pandas.concat().
To that end, we are going to acquire the GDP dataset from Quandl's World Bank World Development Indicators (WWDI) database. Based on the previous code example of acquiring JSON data from Quandl, can you try to adapt it to download the GDP per capita dataset?
For those who are impatient, here is the full code:
import urllib
import json
import pandas as pd
import time
from urllib.request import urlopen
def get_gdp_dataset(api_key, country_code):
"""Obtain and parse a quandl GDP dataset in pandas DataFrame format
Quandl returns dataset in JSON format, where data is stored as a
list of lists in response['dataset']['data'], and column headers
stored in response['dataset']['column_names'].
Args:
api_key: Quandl API key
country_code: Three letter code to represent country
Returns:
df: pandas DataFrame of a Quandl dataset
"""
base_url = "https://www.quandl.com/api/v3/datasets/"
url_suffix = ".json?api_key="
# Compose the Quandl API dataset code to get GDP per capita (constant 2000 US$) dataset
gdp_code = "WWDI/" + country_code + "_NY_GDP_PCAP_KD"
# Parse the JSON response from Quandl API
# Some countries might be missing, so we need error handling code
try:
u = urlopen(base_url + gdp_code + url_suffix + api_key)
except urllib.error.URLError as e:
print(gdp_code,e)
return None
response = json.loads(u.read().decode('utf-8'))
# Format the response as pandas Dataframe
df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])
# Add a new country code column
df['country'] = country_code
return df
api_key = "INSERT-YOUR-KEY-HERE" #Change this to your own API key
quandl_dfs = []
# Loop through all unique country code values in the BigMac index DataFrame
for country_code in bigmac_df.country.unique():
# Fetch the GDP dataset for the corresponding country
df = get_gdp_dataset(api_key, country_code)
# Skip if the response is empty
if df is None:
continue
# Store in a list DataFrames
quandl_dfs.append(df)
# Prevents exceeding the API speed limit
time.sleep(2)
# Concatenate the list of DataFrames into a single one
gdp_df = pd.concat(quandl_dfs)
gdp_df.head()
The GDP data of several geographical regions is missing, but this should be handled gracefully by the try...except code block in the get_gdp_dataset function. This is what you are expecting to see after running the preceding code:
WWDI/EUR_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/ROC_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/SIN_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/UAE_NY_GDP_PCAP_KD HTTP Error 404: Not Found
Date | Value | country | |
0 | 2016-12-31 | 55478.577294 | AUS |
1 | 2015-12-31 | 54800.366396 | AUS |
2 | 2014-12-31 | 54293.794205 | AUS |
3 | 2013-12-31 | 53732.003969 | AUS |
4 | 2012-12-31 | 53315.029915 | AUS |
Next, we will merge the two pandas DataFrames that contain Big Mac Index or GDP per capita using pandas.merge(). The most recent record in WWDI's GDP per capita dataset was collected at the end of 2016, so let's pair that up with the closest Big Mac index dataset in January 2017.
For those who are familiar with the SQL language, pandas.merge() supports four modes, namely left, right, inner, and outer joins. Since we are interested in rows that have matching countries in both pandas DataFrames only, we are going to choose inner join:
merged_df = pd.merge(bigmac_df[(bigmac_df.Date == "2017-01-31")], gdp_df[(gdp_df.Date == "2016-12-31")], how='inner', on='country')
merged_df.head()
Here is the merged data frame:
0 | 1 | 2 | 3 | 4 | |
Date_x | 31-01-17 | 31-01-17 | 31-01-17 | 31-01-17 | 31-01-17 |
local_price | 5.8 | 16.5 | 3.09 | 2450 | 55 |
dollar_ex | 1.356668 | 3.22395 | 0.828775 | 672.805 | 15.8575 |
dollar_price | 4.27518 | 5.117945 | 3.728394 | 3.641471 | 3.46839 |
dollar_ppp | 1.146245 | 3.26087 | 0.610672 | 484.189723 | 10.869565 |
dollar_valuation | -15.510277 | 1.145166 | -26.316324 | -28.034167 | -31.454736 |
dollar_adj_valuation | -11.9234 | 67.5509 | -18.0208 | 11.9319 | 6.26715 |
euro_adj_valuation | -10.2636 | 70.7084 | -16.4759 | 14.0413 | 8.26976 |
sterling_adj_valuation | 7.43771 | 104.382 | 0 | 36.5369 | 29.6269 |
yen_adj_valuation | 9.99688 | 109.251 | 2.38201 | 39.7892 | 32.7146 |
yuan_adj_valuation | -5.82434 | 79.1533 | -12.3439 | 19.6828 | 13.6258 |
Country | AUS | BRA | GBR | CHL | ARG |
Date_y | 31-12-16 | 31-12-16 | 31-12-16 | 31-12-16 | 31-12-16 |
Value | 55478.5773 | 10826.2714 | 41981.3921 | 15019.633 | 10153.99791 |