Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7 Forecasting non-stationary time series

This chapter covers

Examining the autoregressive integrated moving average model, or ARIMA(p,d,q)
Applying the general modeling procedure for non-stationary time series
Forecasting using the ARIMA(p,d,q) model

In chapters 4, 5, and 6 we covered the moving average model, MA(q)); the autoregressive model, AR(p)); and the ARMA model, ARMA(p,q). We saw how these models can only be used for stationary time series, which required us to apply transformations, mainly differencing, and test for stationarity using the ADF test. In the examples that we covered, the forecasts from each model returned differenced values, which required us to reverse this transformation in order to bring the values back to the scale of the original data.

Now we’ll add another component to the ARMA(p,q) model so we can forecast non-stationary time series. This component is the integration order, which is denoted by the variable d. This leads us to the autoregressive integrated moving average (ARIMA) model, or ARIMA(p,d,q). Using this model, we can take into account non-stationary time series and avoid the steps of modeling on differenced data and having to inverse transform the forecasts.

In this chapter, we’ll define the ARIMA(p,d,q) model and the order of integration d. Then we’ll add a step to our general modeling procedure. Figure 7.1 shows the general modeling procedure as defined in chapter 6. We must add a step to determine the order of integration in order to use this procedure with the ARIMA(p,d,q) model.

Figure 7.1 General modeling procedure using an ARMA(p, q) model. In this chapter, we will add another step to this procedure in order to accommodate the ARIMA(p,d,q) model.

Then we’ll apply our modified procedure to forecast a non-stationary time series, meaning that the series has a trend, or its variance is not constant over time. Specifically, we’ll revisit the dataset of Johnson & Johnson’s quarterly earnings per share (EPS) between 1960 and 1980, which we first studied in chapters 1 and 2. The series is shown in figure 7.2. We’ll apply the ARIMA(p,d,q) model to forecast the quarterly EPS for 1 year.

Figure 7.2 Quarterly earnings per share (EPS) of Johnson & Johnson from 1960 to 1980. We worked with the same dataset in chapters 1 and 2.

7.1 Defining the autoregressive integrated moving average model

An autoregressive integrated moving average process is the combination of an autoregressive process AR(p), integration I(d), and the moving average process MA(q).

Just like the ARMA process, the ARIMA process states that the present value is dependent on past values, coming from the AR(p) portion, and past errors, coming from the MA(q) portion. However, instead of using the original series, denoted as y_t, the ARIMA process uses the differenced series, denoted as y'_t. Note that y'_t can represent a series that has been differenced more than once.

Therefore, the mathematical expression of the ARIMA(p,d,q) process states that the present value of the differenced series y'_t is equal to the sum of a constant C, past values of the differenced series φ_py'_t–p, the mean of the differenced series µ, past error terms θ_qϵ_t–q, and a current error term ϵ_t, as shown in equation 7.1.

y'_t = C + φ₁y'_t–1 +⋅⋅⋅ φ_py'_t–p + θ₁ϵ'_t–1 +⋅⋅⋅+ θ_qϵ'_t–q + ϵ_t

Equation 7.1

Just like in the ARMA process, the order p determines how many lagged values of the series are included in the model, while the order q determines how many lagged error terms are included in the model. However, in equation 7.1 you’ll notice that there is no order d explicitly displayed.

Here, the order dis defined as the order of integration. Integration is simply the reverse of differencing. The order of integration is thus equal to the number of times a series has been differenced to become stationary.

If we difference a series once and it becomes stationary, then d = 1. If a series is differenced twice to become stationary, then d = 2.

An autoregressive integrated moving average (ARIMA) process is the combination of the AR(p) and MA(q) processes, but in terms of the differenced series.

It is denoted as ARIMA(p,d,q), where p is the order of the AR(p) process, d is the order of integration, and q is the order of the MA(q) process.

Integration is the reverse of differencing, and the order of integration d is equal to the number of times the series has been differenced to be rendered stationary.

The general equation of the ARIMA(p,d,q) process is

y'_t = C + φ₁y'_t_–1 +⋅⋅⋅ φ_p y'_t–p + θ₁ϵ'_t–1 +⋅⋅⋅+ θ_qϵ'_t–q + ϵ_t

Note that y'_t represents the differenced series, and it may have been differenced more than once.

A time series that can be rendered stationary by applying differencing is said to be an integrated series. In the presence of a non-stationary integrated time series, we can use the ARIMA(p,d,q) model to produce forecasts.

Thus, in simple terms, the ARIMA model is simply an ARMA model that can be applied on non-stationary time series. Whereas the ARMA(p,q) model requires the series to be stationary before fitting an ARMA(p,q) model, the ARIMA(p,d,q) model can be used on non-stationary series. We must simply find the order of integration d, which corresponds to the minimum number of times a series must be differenced to become stationary.

Therefore, we must add the step of finding the order of integration to our general modeling procedure before we apply it to forecast the quarterly EPS of Johnson & Johnson.

7.2 Modifying the general modeling procedure to account for non-stationary series

In chapter 6 we built a general modeling procedure that allowed us to model more complex time series, meaning that the series has both an autoregressive and a moving average component. This procedure involves fitting many ARMA(p,q) models and selecting the one with the lowest AIC. Then we study the model’s residuals to verify that they resemble white noise. If that is the case, the model can be used for forecasting. We can visualize the general modeling procedure in its present state in figure 7.3.

Figure 7.3 General modeling procedure using an ARMA(p, q) model. Now we must adapt it to apply to an ARIMA(p,d,q) model, allowing us to work with non-stationary time series.

The next iteration of the general modeling procedure will include a step to determine the order of integration d. That way, we can apply the same procedure but using an ARIMA(p,d,q) model, which will allow us to forecast non-stationary time series.

From the previous section, we know that the order of integration d is simply the minimum number of times a series must be differenced to become stationary. Therefore, if a series is stationary after being differenced once, then d = 1. If it is stationary after being differenced twice, then d = 2. In my experience, a time series rarely needs to be differenced more than twice to become stationary.

We can add a step such that when transformations are applied to the series, we set the value of d to the number of times the series was differenced. Then, instead of fitting many ARMA(p,q) models, we fit many ARIMA(p,d,q) models. The rest of the procedure remains the same, as we still use the AIC to select the best model and study its residuals. The resulting procedure is shown in figure 7.4.

Note that in the case where d = 0, it is equivalent to an ARMA(p,q) model. This also means that the series did not need to be differenced to be stationary. It must also be specified that the ARMA(p,q) model can only be applied on a stationary series, whereas the ARIMA(p,d,q) model can be applied on a series that has not been differenced.

Let’s apply our new general modeling procedure to forecast the quarterly earnings per share of Johnson & Johnson.

7.3 Forecasting a non-stationary times series

We are now going to apply the general modeling procedure displayed in figure 7.4 to forecast the quarterly earnings per share (EPS) of Johnson & Johnson. We’ll use the same dataset that was introduced in chapters 1 and 2. We will forecast 1 year’s quarterly EPS, meaning that we must forecast four timesteps into the future, since there are four quarters in a year. The dataset covers the period between 1960 and 1980.

As always, the first step is to collect our data. Here it is done for us, so we can simply load it and display the series. The result is shown in figure 7.5.

Figure 7.4 General modeling procedure for using the ARIMA(p,d,q) model. Notice the addition of a step where we specify the parameter d for the ARIMA(p,d,q) model. Here, d is simply the minimum number of times a series must be differenced to become stationary.

Note At any time, feel free to refer to the source for this chapter on GitHub: https://github.com/marcopeix/TimeSeriesForecastingInPython/tree/master/CH07.

df = pd.read_csv('../data/jj.csv')
 
fig, ax = plt.subplots()
 
ax.plot(df.date, df.data)
ax.set_xlabel('Date')
ax.set_ylabel('Earnings per share (USD)')
 
plt.xticks(np.arange(0, 81, 8), [1960, 1962, 1964, 1966, 1968, 1970, 1972, 
➥ 1974, 1976, 1978, 1980])
 
fig.autofmt_xdate()
plt.tight_layout()

Figure 7.5 Quarterly earnings per share (EPS) of Johnson & Johnson between 1960 and 1980

Following our procedure, we must check if the data is stationary. Figure 7.5 shows a positive trend, as the quarterly EPS tends to increase over time. Nevertheless, we can apply the augmented Dickey-Fuller (ADF) test to determine if it is stationary or not. By now you should be very comfortable with these steps, so they will be accompanied by minimal comments.

ad_fuller_result = adfuller(df['data'])
 
print(f'ADF Statistic: {ad_fuller_result[0]}')
print(f'p-value: {ad_fuller_result[1]}')

This block of code returns an ADF statistic of 2.74 with a p-value of 1.0. Since the ADF statistic is not a large negative number, and the p-value is larger than 0.05, we cannot reject the null hypothesis, meaning that our series is not stationary.

We need to determine how many times the series must be differenced to become stationary. This will then set the order of integration d. We can apply a first-order differencing and test for stationarity.

eps_diff = np.diff(df['data'], n=1)       ❶
 
ad_fuller_result = adfuller(eps_diff)     ❷
 
print(f'ADF Statistic: {ad_fuller_result[0]}')
print(f'p-value: {ad_fuller_result[1]}')

❶ Apply first-order differencing.

❷ Test for stationarity.

This results in an ADF statistic of –0.41 and a p-value of 0.9. Again, the ADF statistic is not a large negative number, and the p-value is larger than 0.05. Therefore, we cannot reject the null hypothesis and we must conclude that after a first-order differencing, the series is not stationary.

Let’s try differencing again to see if the series becomes stationary:

eps_diff2 = np.diff(eps_diff, n=1)        ❶
 
ad_fuller_result = adfuller(eps_diff2)    ❷
 
print(f'ADF Statistic: {ad_fuller_result[0]}')
print(f'p-value: {ad_fuller_result[1]}')

❶ Take the differenced series and difference it again.

❷ Test for stationarity.

This results in an ADF statistic of –3.59 and a p-value of 0.006. Now that we have a p-value smaller than 0.05 and a large negative ADF statistic, we can reject the null hypothesis and conclude that our series is stationary. It took two rounds of differencing to make our data stationary, which means that our order of integration is 2, so d = 2.

Before we move on to fitting different combinations of ARIMA(p,d,q) models, we must separate our data into train and test sets. We will hold out the last year of data for testing. This means that we will fit the model with data from 1960 to 1979 and predict the quarterly EPS in 1980 to evaluate the quality of our model against the observed values in 1980. In figure 7.6 the testing period is the shaded area.

Figure 7.6 The train and test sets. The training period spans the years 1960 to 1979 inclusively, while the test set is the quarterly EPS reported in 1980. This test set corresponds to the last four data points of the dataset.

To fit the many ARIMA(p,d,q) models, we’ll define the optimize_ARIMA function. It is almost identical to the optimize_ARMA function that we defined in chapter 6, only this time we’ll add the order of integration d as an input to the function. The remainder of the function stays the same, as we fit the different models and order them by ascending AIC in order to select the model with the lowest AIC. The optimize_ARIMA function is shown in the following listing.

Listing 7.1 Function to fit all unique ARIMA(p,d,q) models

from typing import Union
from tqdm import tqdm_notebook
from statsmodels.tsa.statespace.sarimax import SARIMAX
 
def optimize_ARIMA(endog: Union[pd.Series, list], order_list: list, d: int) 
➥ -> pd.DataFrame:                                 ❶
    
    results = []                                    ❷
    
    for order in tqdm_notebook(order_list):         ❸
        try: 
            model = SARIMAX(endog, order=(order[0], d, order[1]), 
➥ simple_differencing=False).fit(disp=False)       ❹
        except:
            continue
            
        aic = model.aic                             ❺
        results.append([order, aic])                ❻
        
    result_df = pd.DataFrame(results)               ❼
    result_df.columns = ['(p,q)', 'AIC']            ❽
    
    #Sort in ascending order, lower AIC is better
    result_df = result_df.sort_values(by='AIC', 
➥ ascending=True).reset_index(drop=True)           ❾
    
    return result_df

❶ The function takes as inputs the time series data, the list of unique (p,q) combinations, and the order of integration d.

❷ Initialize an empty list to store each order (p,q) and its corresponding AIC as a tuple.

❸ Iterate over each unique (p,q) combination. The use of tqdm_notebook will display a progress bar.

❹ Fit an ARIMA(p,d,q) model using the SARIMAX function. We specify simple_differencing=False to prevent differencing. We also specify disp=False to avoid printing convergence messages to the console.

❺ Calculate the model’s AIC.

❻ Append the (p,q) combination and AIC as a tuple to the results list.

❼ Store the (p,q) combination and AIC in a DataFrame.

❽ Label the columns of your DataFrame.

❾ Sort the DataFrame in ascending order of AIC values. The lower the AIC, the better the model.

With the function in place, we can define a list of possible values for the orders p and q. In this case, we’ll try the values 0, 1, 2, and 3 for both orders and generate the list of unique (p,q) combinations.

from itertools import product
 
ps = range(0, 4, 1)                   ❶
qs = range(0, 4, 1)                   ❷
d = 2                                 ❸
 
order_list = list(product(ps, qs))    ❹

❶ Create a list of possible values for p from 0 inclusively to 4 exclusively, with steps of 1.

❷ Create a list of possible values for q from 0 inclusively to 4 exclusively, with steps of 1.

❸ Set d to 2, as the series needed to be differenced twice to become stationary.

❹ Generate a list containing all unique combinations of (p,q).

Note that we do not give a range of values for the parameter d because it has a very specific definition: it is the number of times a series must be differenced to become stationary. Hence, it must be set to a specific value, which in this case is 2.

Furthermore, d must be constant in order to compare models using the AIC. Varying d would change the likelihood function used in the calculation of the AIC value, so comparing models using the AIC as a criterion would not be valid anymore.

We can now run the optimize_ARIMA function using the training set. The function returns a DataFrame with the model that has the lowest AIC at the top.

train = df.data[:-4]                               ❶
 
result_df = optimize_ARIMA(train, order_list, d)   ❷
result_df                                          ❸

❶ The training set consists of all data points except the last four.

❷ Run the optimize_ARIMA function to obtain the model with the lowest AIC.

❸ Display the resulting DataFrame.

The returned DataFrame shows that a value of 3 for both p and q results in the lowest AIC. Therefore, an ARIMA(3,2,3) model seems to be the most suitable for this situation. Now let’s assess the validity of the model by studying its residuals.

To do so, we’ll fit an ARIMA(3,2,3) model on the training set and display the residuals’ diagnostics using the plot_diagnostics method. The result is shown in figure 7.7.

model = SARIMAX(train, order=(3,2,3), simple_differencing=False)   ❶
model_fit = model.fit(disp=False)
 
model_fit.plot_diagnostics(figsize=(10,8));                         ❷

❶ Fit an ARIMA(3,2,3) model on the training set, since this model has the lowest AIC.

❷ Display the residuals’ diagnostics.

Figure 7.7 Diagnostics of the ARIMA(3,2,3) residuals. The Q-Q plot at the bottom left displays a fairly straight line with some deviation at the extremities.

In figure 7.7, the top-left plot shows the residuals over time. While there is no trend in the residuals, the variance does not seem to be constant, which is a discrepancy in comparison to white noise. At the top right is the distribution of the residuals. We can see it is fairly close to a normal distribution. The Q-Q plot leads us to the same conclusion, as it displays a line that is fairly straight, meaning that the residuals’ distribution is close to a normal distribution. Finally, by looking at the correlogram at the bottom right, we can see that a coefficient seems to be significant at lag 3. However, since it is not preceded by any significant autocorrelation coefficients, we can assume that this is due to chance. Therefore, we can say that the correlogram shows no significant coefficients after lag 0, just like white noise.

Thus, from a qualitative standpoint, it seems that our residuals are close to white noise, which is a good sign, as it means that the model’s errors are random.

The last step is to evaluate the residuals from a quantitative standpoint. We’ll thus apply the Ljung-Box test to determine whether the residuals are correlated. We’ll apply the test on the first 10 lags and study the p-values. If all p-values are greater than 0.05, we cannot reject the null hypothesis and we’ll conclude that the residuals are not correlated, just like white noise.

from statsmodels.stats.diagnostic import acorr_ljungbox
 
residuals = model_fit.resid                                         ❶
 
lbvalue, pvalue = acorr_ljungbox(residuals, np.arange(1, 11, 1))    ❷
 
print(pvalue)

❶ Store the model’s residuals in a variable.

❷ Apply the Ljung-Box test on the first 10 lags.

Running the Ljung-Box test on the first 10 lags of the model’s residuals returns a list of p-values that are all larger than 0.05. Therefore, we do not reject the null hypothesis, and we conclude that the residuals are not correlated, just like white noise.

Our ARIMA(3,2,3) model has passed all the checks, and it can now be used for forecasting. Remember that our test set is the last four data points, corresponding to the four quarterly EPS reported in 1980. As a benchmark for our model, we will use the naive seasonal method. This means that we’ll take the EPS of the first quarter of 1979 and use it as a forecast for the EPS of the first quarter of 1980. Then the EPS of the second quarter of 1979 will be used as a forecast for the EPS of the second quarter of 1980, and so on. Remember that we need a benchmark, or a baseline model, when modeling to determine whether the model we develop is better than a naive method. The performance of a model must always be assessed relative to a baseline model.

test = df.iloc[-4:]                                      ❶
 
test['naive_seasonal'] = df['data'].iloc[76:80].values   ❷

❶ The test set corresponds to the last four data points.

❷ The naive seasonal forecast is implemented by selecting the quarterly EPS reported in 1979 and using the same values as a forecast for the year 1980.

With our baseline in place, we can now make forecasts using the ARIMA(3,2,3) model and store the results in the ARIMA_pred column.

ARIMA_pred = model_fit.get_prediction(80, 83).predicted_mean    ❶
 
test['ARIMA_pred'] = ARIMA_pred                                  ❷

❶ Get the predicted values for the year 1980.

❷ Assign the forecasts to the ARIMA_pred column.

Let’s visualize our forecasts to see how close the predictions from each method are to the observed values. The resulting plot is shown in figure 7.8.

Figure 7.8 Forecasts of the quarterly EPS of Johnson & Johnson in 1980. We can see that the predictions coming from the ARIMA(3,2,3) model, shown as a dashed line, almost perfectly overlap the observed data in 1980.

In figure 7.8 we can see the naive seasonal forecast as a dotted line and the ARIMA(3,2,3) forecasts as a dashed line. The ARIMA(3,2,3) model predicted the quarterly EPS with a very small error.

We can quantify that error by measuring the mean absolute percentage error (MAPE) and display the metric for each forecasting method in a bar plot, as shown in figure 7.9.

def mape(y_true, y_pred):                                          ❶
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
 
mape_naive_seasonal = mape(test['data'], test['naive_seasonal'])   ❷
mape_ARIMA = mape(test['data'], test['ARIMA_pred'])                ❸
 
fig, ax = plt.subplots()
 
x = ['naive seasonal', 'ARIMA(3,2,3)']
y = [mape_naive_seasonal, mape_ARIMA]
 
ax.bar(x, y, width=0.4)
ax.set_xlabel('Models')
ax.set_ylabel('MAPE (%)')
ax.set_ylim(0, 15)
 
for index, value in enumerate(y):
    plt.text(x=index, y=value + 1, s=str(round(value,2)), ha='center')
 
plt.tight_layout()

❶ Define a function to compute the MAPE.

❷ Compute the MAPE for the naive seasonal method.

❸ Compute the MAPE for the ARIMA(3,2,3) model.

Figure 7.9 The MAPE for both forecasting methods. You can see that the ARIMA model has an error metric that is one fifth of the baseline.

In figure 7.9, you can see that the MAPE for the naive seasonal forecast is 11.56%, while the MAPE for the ARIMA(3,2,3) model is 2.19%, which roughly one fifth of the benchmark value. This means that our predictions are on average 2.19% off from the actual values. The ARIMA(3,2,3) model is clearly a better model than the naive seasonal method.

7.4 Next steps

In this chapter, we covered the ARIMA(p,d,q) model, which allows us to model and forecast non-stationary time series.

The order of integration d defines how many times a series must be differenced to become stationary. This parameter then allows us to fit the model on the original series and get a forecast in the same scale, unlike the ARMA(p,q) model, which required the series to be stationary for the model to be applied and required us to reverse the transformations on the forecasts.

To apply the ARIMA(p,d,q) model, we added an extra step to our general modeling procedure, which simply involves finding the value for the order of integration. This corresponds to the minimum number of times a series must be differenced to become stationary.

Now we can add another layer to the ARIMA(p,d,q) model that allows us to consider yet another property of time series: seasonality. We have studied the Johnson & Johnson dataset enough times to realize that there are clear cyclical patterns in the series. To integrate the seasonality of a series in a model, we must use the seasonal autoregressive integrated moving average (SARIMA) model, or SARIMA(p,d,q)(P,D,Q)_m. This will be the subject of the next chapter.

7.5 Exercises

Now is the time to apply the ARIMA model on previous datasets that we have explored. The full solution to this exercise is available on GitHub: https://github.com/marcopeix/ TimeSeriesForecastingInPython/tree/master/CH07.

7.5.1 Apply the ARIMA(p,d,q) model on the datasets from chapters 4, 5, and 6

In chapters 4, 5, and 6, non-stationary time series were introduced to show you how to apply the MA(q), AR(p), and ARMA(p,q) models. In each chapter, we transformed the series to make it stationary, fit the model, made forecasts, and had to reverse the transformation on the forecasts to bring them back to the original scale of the data.

Now that you know how to account for non-stationary time series, revisit each dataset and apply the ARIMA(p,d,q) model. For each dataset, do the following:

Apply the general modeling procedure.
Is an ARIMA(0,1,2) model suitable for the dataset in chapter 4?
Is an ARIMA(3,1,0) model suitable for the dataset in chapter 5?
Is an ARIMA(2,1,2) model suitable for the dataset in chapter 6?

Summary

The autoregressive integrated moving average model, denoted as ARIMA(p,d,q), is the combination of the autoregressive model AR(p), the order of integration d, and the moving average model MA(q).
The ARIMA(p,d,q) model can be applied on non-stationary time series and has the added advantage of returning forecasts in the same scale as the original series.
The order of integration d is equal to the minimum number of times a series must be differenced to become stationary.
An ARIMA(p,0,q) model is equivalent to an ARMA(p,q) model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7 Forecasting non-stationary time series

Create new playlist

Sign In

Sign Up