Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. R. Kulkarni et al.Time Series Algorithms Recipeshttps://doi.org/10.1007/978-1-4842-8978-5_1

1. Getting Started with Time Series

Akshay R Kulkarni¹, Adarsha Shivananda², Anoosh Kulkarni³ and V Adithya Krishnan⁴

(1)

Bangalore, Karnataka, India

(2)

Hosanagara, Karnataka, India

(3)

Bangalore, India

(4)

Navi Mumbai, India

A time series is a sequence of time-dependent data points. For example, the demand (or sales) for a product in an e-commerce website can be measured temporally in a time series, where the demand (or sales) is ordered according to the time. This data can then be analyzed to find critical temporal insights and forecast future values, which helps businesses plan and increase revenue.

Time series data is used in every domain where real-time analytics is essential. Analyzing this data and forecasting its future value has become essential to these domains.

Time series analysis/forecasting was previously considered a purely statistical problem. It is now used in many machine learning and deep learning–based solutions, which perform equally well or even outperform most other solutions. This book uses various methods and approaches to analyze and forecast time series.

This chapter uses recipes to read/write time series data and perform simple preprocessing and Exploratory Data Analysis (EDA).

The following lists the recipes explored in this chapter.

Recipe 1-1. Reading Time Series Objects
Recipe 1-2. Saving Time Series Objects
Recipe 1-3. Exploring Types of Time Series Data
Recipe 1-4. Time Series Components
Recipe 1-5. Time Series Decomposition
Recipe 1-6. Visualization of Seasonality

Recipe 1-1A. Reading Time Series Objects (Air Passengers)

Problem

You want to read and load time series data into a dataframe.

Solution

Pandas load the data into a dataframe structure.

How It Works

The following steps read the data.

Step 1A-1. Import the required libraries.

import pandas as pd

import matplotlib.pyplot as plt

Step 1A-2. Write a parsing function for the datetime column.

Before reading the data, let’s write a parsing function.

date_parser_fn = lambda dates: pd.datetime.strptime(dates, '%Y-%m')

Step 1A-3. Read the data.

Read the air passenger data.

data = pd.read_csv('./data/AirPassenger.csv', parse_dates = ['Month'], index_col = 'Month', date_parser = date_parser_fn)

plt.plot(data)

plt.show()

Figure 1-1 shows the time series plot output.

The following are some of the important input arguments for read_csv.

parse_dates mentions the datetime column in the dataset that needs to be parsed.
index_col mentions the column that is a unique identifier for the pandas dataframe. In most time series use cases, it’s the datetime column.
date_parser is a function to parse the dates (i.e., converts an input string to datetime format/type). pandas reads the data in YYYY-MM-DD HH:MM:SS format. Convert to this format when using the parser function.

Recipe 1-1B. Reading Time Series Objects (India GDP Data)

Problem

You want to save the loaded time series dataframe in a file.

Solution

Save the dataframe as a comma-separated (CSV) file.

How It Works

The following steps read the data.

Step 1B-1. Import the required libraries.

import pandas as pd

import matplotlib.pyplot as plt

import pickle

Step 1B-2. Read India’s GDP time series data.

indian_gdp_data = pd.read_csv('./data/GDPIndia.csv', header=0)

date_range = pd.date_range(start='1/1/1960', end='31/12/2017', freq='A')

indian_gdp_data ['TimeIndex'] = pd.DataFrame(date_range, columns=['Year'])

indian_gdp_data.head(5).T

Step 1B-3. Plot the time series.

plt.plot(indian_gdp_data.TimeIndex, indian_gdp_data.GDPpercapita)

plt.legend(loc='best')

plt.show()

Figure 1-2 shows the output time series.

Step 1B-4. Store and retrieve as a pickle.

### Store as a pickle object

import pickle

with open('gdp_india.obj', 'wb') as fp:

pickle.dump(IndiaGDP, fp)

### Retrieve the pickle object

with open('gdp_india.obj', 'rb') as fp:

indian_gdp_data1 = pickle.load(fp)

indian_gdp_data1.head(5).T

Figure 1-3 shows the retrieved time series object transposed.

Recipe 1-2. Saving Time Series Objects

Problem

You want to save a loaded time series dataframe into a file.

Solution

Save the dataframes as a CSV file.

How It Works

The following steps store the data.

Step 2-1. Save the previously loaded time series object.

### Saving the TS object as csv

data.to_csv('ts_data.csv', index = True, sep = ',')

### Check the obj stored

data1 = data.from_csv('ts_data.csv', header = 0)

### Check

print(data1.head(2).T)

The output is as follows.

1981-01-01

1981-01-02 17.9

1981-01-03 18.8

Name: 20.7, dtype: float64

Recipe 1-3A. Exploring Types of Time Series Data: Univariate

Problem

You want to load and explore univariate time series data.

Solution

A univariate time series is data with a single time-dependent variable.

Let’s look at a sample dataset of the monthly minimum temperatures in the Southern Hemisphere from 1981 to 1990. The temperature is the time-dependent target variable.

How It Works

The following steps read and plot the univariate data.

Step 3A-1. Import the required libraries.

import pandas as pd

import matplotlib.pyplot as plt

Step 3A-2. Read the time series data.

data = pd.read_csv('./data/daily-minimum-temperatures.csv', header = 0, index_col = 0, parse_dates = True, squeeze = True)

print(data.head())

The output is as follows.

Date

1981-01-01 20.7

1981-01-02 17.9

1981-01-03 18.8

1981-01-04 14.6

1981-01-05 15.8

Name: Temp, dtype: float64

Step 3A-3. Plot the time series.

Let’s now plot the time series data to detect patterns.

data.plot()

plt.ylabel('Minimum Temp')

plt.title('Min temp in Southern Hemisphere From 1981 to 1990')

plt.show()

Figure 1-4 shows the output time series plot.

This is called univariate time series analysis since only one variable, temp (the temperature over the past 19 years), was used.

Recipe 1-3B. Exploring Types of Time Series Data: Multivariate

Problem

You want to load and explore multivariate time series data.

Solution

A multivariate time series is a type of time series data with more features that the target depends on, which are also time-dependent; that is, the target is not only dependent on its past values. This relationship is used to forecast the target values.

Let’s load and explore a Beijing pollution dataset, which is multivariate.

How It Works

The following steps read and plot the multivariate data.

Step 3B-1. Import the required libraries.

import pandas as pd

from datetime import datetime

import matplotlib.pyplot as plt

Step 3B-2. Write the parsing function.

Before loading the raw dataset and parsing the datetime information as the pandas dataframe index, let’s first write a parsing function.

def parse(x):

return datetime.strptime(x, '%Y %m %d %H')

Step 3B-3. Load the dataset.

data1 = pd.read_csv('./data/raw.csv', parse_dates = [['year', 'month', 'day', 'hour']],

index_col=0, date_parser=parse)

Step 3B-4. Do basic preprocessing.

Drop the No column.

data1.drop('No', axis=1, inplace=True)

Manually specify each column name.

data1.columns = ['pollution', 'dew', 'temp', 'press', 'wnd_dir', 'wnd_spd', 'snow', 'rain']

data1.index.name = 'date'

Let’s mark all NA values with 0.

data1['pollution'].fillna(0, inplace=True)

Drop the first 24 hours.

data1 = data1[24:]

Summarize the first five rows.

print(data1.head(5))

The output is as follows.

pollution dew temp press wnd_dir wnd_spd snow rain

date

2010-01-02 00:00:00 129.0 -16 -4.0 1020.0 SE 1.79 0 0

2010-01-02 01:00:00 148.0 -15 -4.0 1020.0 SE 2.68 0 0

2010-01-02 02:00:00 159.0 -11 -5.0 1021.0 SE 3.57 0 0

2010-01-02 03:00:00 181.0 -7 -5.0 1022.0 SE 5.36 1 0

2010-01-02 04:00:00 138.0 -7 -5.0 1022.0 SE 6.25 2 0

This information is from a dataset on the pollution and weather conditions in Beijing. The time aggregation of the recordings was hourly and measured for five years. The data includes the datetime column, the pollution metric known as PM2.5 concentration, and some critical weather information, including temperature, pressure, and wind speed.

Step 3B-5. Plot each series.

Now let’s plot each series as a separate subplot, except wind speed direction, which is categorical.

vals = data1.values

# specify columns to plot

group_list = [0, 1, 2, 3, 5, 6, 7]

i = 1

# plot each column

plt.figure()

for group in group_list:

plt.subplot(len(group_list), 1, i)

plt.plot(vals[:, group])

plt.title(data1.columns[group], y=0.5, loc='right')

i += 1

plt.show()

Figure 1-5 shows the plot of all variables across time.

Figure 1-5
A plot of all variables across time

Recipe 1-4A. Time Series Components: Trends

Problem

You want to find the components of the time series, starting with trends.

Solution

A trend is the overall movement of data in a particular direction—that is, the values going upward (increasing) or downward (decreasing) over a period of time.

Let’s use a shampoo sales dataset, which has a monthly sales count for three years.

How It Works

The following steps read and plot the data.

Step 4A-1. Import the required libraries.

import pandas as pd

import matplotlib.pyplot as plt

Step 4A-2. Write the parsing function.

def parsing_fn(x):

return datetime.strptime('190'+x, '%Y-%m')

Step 4A-3. Load the dataset.

data = pd.read_csv('./data/shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser= parsing_fn)

Step 4A-4. Plot the time series.

data.plot()

plt.show()

Figure 1-6 shows the time series plot.

This data has a rising trend, as seen in Figure 1-6. The output time series plot shows that, on average, the values increase with time.

Recipe 1-4B. Time Series Components: Seasonality

Problem

You want to find the components of time series data based on seasonality.

Solution

Seasonality is the recurrence of a particular pattern or change in time series data.

Let’s use a Melbourne, Australia, minimum daily temperature dataset from 1981–1990. The focus is on seasonality.

How It Works

The following steps read and plot the data.

Step 4B-1. Import the required libraries.

import pandas as pd

import matplotlib.pyplot as plt

Step 4B-2. Read the data.

data = pd.read_csv('./data/daily-minimum-temperatures.csv', header = 0, index_col = 0, parse_dates = True, squeeze = True)

Step 4B-3. Plot the time series.

data.plot()

plt.ylabel('Minimum Temp')

plt.title('Min temp in Southern Hemisphere from 1981 to 1990')

plt.show()

Figure 1-7 shows the time series plot.

Figure 1-7 shows that this data has a strong seasonality component (i.e., a repeating pattern in the data over time).

Step 4B-4. Plot a box plot by month.

Let’s visualize a box plot to check monthly variation in 1990.

month_df = DataFrame()

one_year_ser = data['1990']

grouped_df = one_year_ser.groupby(Grouper(freq='M'))

month_df = pd.concat([pd.DataFrame(x[1].values) for x in grouped_df], axis=1)

month_df = pd.DataFrame(month_df)

month_df.columns = range(1,13)

month_df.boxplot()

plt.show()

Figure 1-8 shows the box plot output by month.

Figure 1-8
Monthly level box plot output

The box plot, Figure 1-8, shows the distribution of minimum temperature for each month. There appears to be a seasonal component each year, showing a swing from summer to winter. This implies a monthly seasonality.

Step 4B-5. Plot a box plot by year.

Let’s group by year to see the change in distribution across various years. This way, you can check for seasonality at every time aggregation.

grouped_ser = data.groupby(Grouper(freq='A'))

year_df = pd.DataFrame()

for name, group in grouped_ser:

year_df[name.year] = group.values

year_df.boxplot()

plt.show()

Figure 1-9 shows the box plot output by year.

Figure 1-9 reveals that there is not much yearly seasonality or trends in the box plot output.

Recipe 1-4C. Time Series Components: Seasonality (cont’d.)

Problem

You want to find time series components using another example of seasonality.

Solution

Let’s explore tractor sales data to understand seasonality.

How It Works

The following steps read and plot the data.

Step 4C-1. Import the required libraries.

import pandas as pd

import matplotlib.pyplot as plt

Step 4C-2. Read tractor sales data.

tractor_sales_data = pd.read_csv("./data/tractor_salesSales.csv")

tractor_sales_data.head(5)

Step 4C-3. Set a datetime series to use as an index.

date_ser = pd.date_range(start='2003-01-01', freq='MS', periods=len(Tractor))

Step 4C-4. Format the data.

tractor_sales_data.rename(columns={'Number of Tractor Sold':'Tractor-Sales'}, inplace=True)

tractor_sales_data.set_index(dates, inplace=True)

tractor_sales_data = tractor_sales_data[['Tractor-Sales']]

tractor_sales_data.head(5)

Step 4C-5. Plot the time series.

tractor_sales_data.plot()

plt.ylabel('Tractor Sales')

plt.title("Tractor Sales from 2003 to 2014")

plt.show()

Figure 1-10 shows the time series plot output.

From the time series plot, Figure 1-10 shows that the data has a strong seasonality with an increasing trend.

Step 4C-6. Plot a box plot by month.

Let’s check the box plot by month to better understand the seasonality.

month_df = pd.DataFrame()

one_year_ser = tractor_sales_data['2011']

grouped_ser = one_year_ser.groupby(Grouper(freq='M'))

month_df = pd.concat([pd.DataFrame(x[1].values) for x in grouped_ser], axis=1)

month_df = pd.DataFrame(month_df)

month_df.columns = range(1,13)

month_df.boxplot()

plt.show()

Figure 1-11 shows the box plot output by month.

The box plot shows a seasonal component each year, with a swing from May to August.

Recipe 1-5A. Time Series Decomposition: Additive Model

Problem

You want to learn how to decompose a time series using additive model decomposition.

Solution

The additive model suggests that the components add up.
It is linear, where changes over time are constantly made in the same amount.
The seasonality should have the same frequency and amplitude. Frequency is the width between cycles, and amplitude is the height of each cycle.

The statsmodel library has an implementation of the classical decomposition method, but the user has to specify whether the model is additive or multiplicative. The function is called seasonal_decompose.

How It Works

The following steps load and decompose the time series.

Step 5A-1. Load the required libraries.

### Load required libraries

import pandas as pd

import matplotlib.pyplot as plt

from statsmodels.tsa.seasonal import seasonal_decompose

import statsmodels.api as sm

Step 5A-2. Read and process retail turnover data.

turn_over_data = pd.read_csv('./data/RetailTurnover.csv')

date_range = pd.date_range(start='1/7/1982', end='31/3/1992', freq='Q')

turn_over_data['TimeIndex'] = pd.DataFrame(date_range, columns=['Quarter'])

Step 5A-3. Plot the time series.

plt.plot(turn_over_data.TimeIndex, turn_over_data.Turnover)

plt.legend(loc='best')

plt.show()

Figure 1-12 shows the time series plot output.

Figure 1-12 shows that the trend is linearly increasing, and there is constant linear seasonality.

Step 5A-4. Decompose the time series.

Let’s decompose the time series by trends, seasonality, and residuals.

decomp_turn_over = sm.tsa.seasonal_decompose(turn_over_data.Turnover, model="additive", freq=4)

decomp_turn_over.plot()

plt.show()

Figure 1-13 shows the time series decomposition output.

Figure 1-13
Time series decomposition output

Step 5A-5. Separate the components.

You can get the trends, seasonality, and residuals as separate series with the following.

trend = decomp_turn_over.trend

seasonal = decomp_turn_over.seasonal

residual = decomp_turn_over.resid

Recipe 1-5B. Time Series Decomposition: Multiplicative Model

Problem

You want to learn how to decompose a time series using multiplicative model decomposition.

Solution

A multiplicative model suggests that the components are multiplied up.
It is non-linear, such as quadratic or exponential, which means that the changes increase or decrease with time.
The seasonality has an increasing or a decreasing frequency and/or amplitude.

How It Works

The following steps load and decompose the time series.

Step 5B-1. Load the required libraries.

### Load required libraries

import pandas as pd

import matplotlib.pyplot as plt

from statsmodels.tsa.seasonal import seasonal_decompose

import statsmodels.api as sm

Step 5B-2. Load air passenger data.

air_passengers_data = pd.read_csv('./data/AirPax.csv')

Step 5B-3. Process the data.

date_range = pd.date_range(start='1/1/1949', end='31/12/1960', freq='M')

air_passengers_data ['TimeIndex'] = pd.DataFrame(date_range, columns=['Month'])

print(air_passengers_data.head())

The output is as follows.

Year Month Pax TimeIndex

0 1949 Jan 112 1949-01-31

1 1949 Feb 118 1949-02-28

2 1949 Mar 132 1949-03-31

3 1949 Apr 129 1949-04-30

4 1949 May 121 1949-05-31

Figure 1-14 shows the time series output plot.

Step 5B-4. Decompose the time series.

decomp_air_passengers_data = sm.tsa.seasonal_decompose(air_passengers_data.Pax, model="multiplicative", freq=12)

decomp_air_passengers_data.plot()

plt.show()

Figure 1-15 shows the time series decomposition output.

Figure 1-15
Time series decomposition output

Step 5B-5. Get the seasonal component.

Seasonal_comp = decomp_air_passengers_data.seasonal

Seasonal_comp.head(4)

The output is as follows.

0 0.910230

1 0.883625

2 1.007366

3 0.975906

Name: Pax, dtype: float64

Recipe 1-6. Visualization of Seasonality

Problem

You want to learn how to visualize the seasonality component.

Solution

Let’s look at a few additional methods to visualize and detect seasonality. The retail turnover data shows the seasonality component per quarter.

How It Works

The following steps load and visualize the time series (i.e., the seasonality component).

Step 6-1. Import the required libraries.

import pandas as pd

import matplotlib.pyplot as plt

Step 6-2. Load the data.

turn_over_data = pd.read_csv('./data/RetailTurnover.csv')

Step 6-3. Process the data.

date_range = pd.date_range(start='1/7/1982', end='31/3/1992', freq='Q')

turn_over_data['TimeIndex'] = pd.DataFrame(date_range, columns=['Quarter'])

Step 6-4. Pivot the table.

Now let’s pivot the table such that quarterly information is in the columns, yearly information is in the rows, and the values consist of turnover information.

quarterly_turn_over_data = pd.pivot_table(turn_over_data, values = "Turnover", columns = "Quarter", index = "Year")

quarterly_turn_over_data

Figure 1-16 shows the output by quarterly turnover.

Step 6-5. Plot the line charts.

Let’s plot line plots for the four quarters.

quarterly_turn_over_data.plot()

plt.show()

Figure 1-17 shows the quarter-level line plots.

Figure 1-17
Quarterly turnover line chart

Step 6-6. Plot the box plots.

Let’s also plot the box plot at the quarterly level.

quarterly_turn_over_data.boxplot()

plt.show()

Figure 1-18 shows the output of the box plot by quarter.

Looking at both the box plot and the line plot, you can conclude that the yearly turnover is significantly high in the first quarter and is quite low in the second quarter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1. Getting Started with Time Series

Create new playlist

Sign In

Sign Up

1. Getting Started with Time Series

Recipe 1-1A. Reading Time Series Objects (Air Passengers)

Problem

Solution

How It Works

Step 1A-1. Import the required libraries.

Step 1A-2. Write a parsing function for the datetime column.

Step 1A-3. Read the data.

Recipe 1-1B. Reading Time Series Objects (India GDP Data)

Problem

Solution

How It Works

Step 1B-1. Import the required libraries.

Step 1B-2. Read India’s GDP time series data.

Step 1B-3. Plot the time series.

Step 1B-4. Store and retrieve as a pickle.

Recipe 1-2. Saving Time Series Objects

Problem

Solution

How It Works

Step 2-1. Save the previously loaded time series object.

Recipe 1-3A. Exploring Types of Time Series Data: Univariate

Problem

Solution

How It Works

Step 3A-1. Import the required libraries.

Step 3A-2. Read the time series data.

Step 3A-3. Plot the time series.

Recipe 1-3B. Exploring Types of Time Series Data: Multivariate

Problem

Solution

How It Works

Step 3B-1. Import the required libraries.

Step 3B-2. Write the parsing function.

Step 3B-3. Load the dataset.

Step 3B-4. Do basic preprocessing.

Step 3B-5. Plot each series.

Recipe 1-4A. Time Series Components: Trends

Problem

Solution

How It Works

Step 4A-1. Import the required libraries.

Step 4A-2. Write the parsing function.

Step 4A-3. Load the dataset.

Step 4A-4. Plot the time series.

Recipe 1-4B. Time Series Components: Seasonality

Problem

Solution

How It Works

Step 4B-1. Import the required libraries.

Step 4B-2. Read the data.

Step 4B-3. Plot the time series.

Step 4B-4. Plot a box plot by month.

Step 4B-5. Plot a box plot by year.

Recipe 1-4C. Time Series Components: Seasonality (cont’d.)

Problem

Solution

How It Works

Step 4C-1. Import the required libraries.

Step 4C-2. Read tractor sales data.

Step 4C-3. Set a datetime series to use as an index.

Step 4C-4. Format the data.

Step 4C-5. Plot the time series.

Step 4C-6. Plot a box plot by month.

Recipe 1-5A. Time Series Decomposition: Additive Model

Problem

Solution

How It Works

Step 5A-1. Load the required libraries.

Step 5A-2. Read and process retail turnover data.

Step 5A-3. Plot the time series.

Step 5A-4. Decompose the time series.

Step 5A-5. Separate the components.

Recipe 1-5B. Time Series Decomposition: Multiplicative Model

Problem

Solution

Table of Contents for
1. Getting Started with Time Series