© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2021
T. C. NokeriData Science Revealedhttps://doi.org/10.1007/978-1-4842-6870-4_4

4. High-Quality Time-Series Analysis

Tshepo Chris Nokeri1  
(1)
Pretoria, South Africa
 

The preceding chapter covered seasonal ARIMA. After all the considerable effort in data preprocessing and hyperparameter optimization, the model generates considerable errors when forecasting future instances of the series. For a fast and automated forecasting procedure, use Facebook’s Prophet; it forecasts time-series data based on nonlinear trends with seasonality and holiday effects. This chapter introduces Prophet and presents a way of developing and testing an additive model. First, it discusses the crucial difference between the Statsmodels package and the Prophet package.

The Difference Between Statsmodels and Prophet

Time-series analysis typically requires missing values and outliers diagnosis and treatment, multiple test statistics to verify key assumptions, hyperparameter optimization, and seasonal effects control. If we commit a slight mistake in the workflow, then the model will make significant mistakes when forecasting future instances. Consequently, building a model using Statsmodels reasonably requires a certain level of mastery.

Fondly remember that machine learning is about inducing a computer with intelligence using minimal code. Statsmodels does not offer us that. Prophet fills that gap. It was developed by Facebook’s Core Data Science team. It performs tasks such as missing value and outlier detection, hyperparameter optimization, and control of seasonality and holiday effects. To install FB Prophet in the Python environment, use pip install fbprophet. To install it in the Conda environment, use conda install -c conda-forge fbprophet.

Understand Prophet Components

In the previous chapter, we painstakingly built a time-series model with three components, namely, trend, seasonality, and irregular components.

Prophet Components

The Prophet package sufficiently takes into account the trend, seasonality and holidays, and events. In this section, we will briefly discuss the key components of the package.

Trend

A trend is a single and consistent directional movement (upward or downward). We fit a time-series model to discover a trend. Table 4-1 highlights tunable trend parameters.
Table 4-1

Tunable Trend Parameters

Parameters

Description

growth

Specifies a piecewise linear or logistic trend

n_changepoints

The number of changes to be automatically included, if changepoints are not specified

change_prior_scale

The change of automatic changepoint selection

Changepoints

A changepoint is a penalty term. It alters the behavior of a time-series model. If you do not specify changepoints, the default changepoint value is automated.

Seasonality

Seasonality represents consistent year-to-year upward or downward movements. Equation 4-1 approximates this.
$$ S(t)={sum}_{n=1}^N{alpha}_n coscos left(frac{2pi nt}{P}
ight)+{b}_nmathit{cos}left(frac{2pi nt}{P}
ight) $$
(Equation 4-1)

Here, P is the period (7 for weekly data, 30 for monthly data, 90 for quarterly data, and 365 for yearly data).

Holidays and Events Parameters

Holidays and events affect the time series’ conditional mean. Table 4-2 outlines key tunable holiday and event parameters.
Table 4-2

Holiday and Event Parameters

Parameters

Description

daily_seasonality

Fit daily seasonality

weekly_seasonality

Fit weekly seasonality

year_seasonality

Fit yearly seasonality

holidays

Include holiday name and date

yeasonality_prior_scale

Determine strength of needed for seasonal or holiday components

The Additive Model

The additive models assume that the trend and cycle are treated as one term. Its components are similar to the Holt-Winters technique. We express the equation as shown in Equation 4-2.
$$ y=S(t)+T(t)ast I(t) $$
(Equation 4-2)
The formula is written mathematically as shown in Equation 4-3.
$$ y=g(t)+s(t)+h(t)+{varepsilon}_i $$
(Equation 4-3)

Here, g(t) represents the linear or logistic growth curve for modeling changes that are not periodic, s(t) represents the periodic changes (daily, weekly, yearly seasonality), h(t) represents the effects of holidays, and + ?? represents the error term that considers unusual changes.

Data Preprocessing

We obtained the example data from Yahoo Finance.1 Before training the model, repurpose the data as follows:
  • df["ds"], which repurposes time

  • df["y"], which repurposes the independent variable

  • df.set_index(""), which sets the date and time as the index column

Listing 4-1 sets column names to the right format.
df["ds"] = df["Date"]
df["y"] = df["Close"]
df.set_index("Date")
Listing 4-1

Process Data

Table 4-3 shows how a dataframe should look like before developing a time-series model using Prophet.
Table 4-3

Dataset

 

Close

ds

y

Date

   

2019-07-03

14.074300

2019-07-03

14.074300

2019-07-04

14.052900

2019-07-04

14.052900

2019-07-05

14.038500

2019-07-05

14.038500

2019-07-08

14.195200

2019-07-08

14.195200

2019-07-09

14.179500

2019-07-09

14.179500

...

...

...

...

2020-06-29

17.298901

2020-06-29

17.298901

2020-06-30

17.219200

2020-06-30

17.219200

2020-07-01

17.341900

2020-07-01

17.341900

2020-07-02

17.039301

2020-07-02

17.039301

2020-07-03

17.037100

2020-07-03

17.037100

Develop the Prophet Model

Listing 4-2 specifies the official holidays. Listing 4-3 configures the model with a confidence interval to 95%, and it considers yearly seasonality, weekly seasonality, and daily seasonality.
holidays = pd.DataFrame({
  'holiday': 'playoff',
  'ds': pd.to_datetime(["2020-12-25", "2020-12-24", "2020-12-23", "2019-12-25", "2021-01-01", "2021-01-20"]),
    "lower_window": 0,
    "upper_window": 1,
})
Listing 4-2

Specify Holidays

Listing 4-3 completes the model.
m = Prophet(holidays=holidays,
            interval_width=0.95,
            yearly_seasonality=True,
            weekly_seasonality=True,
            daily_seasonality=True,
            changepoint_prior_scale=0.095)
m.add_country_holidays(country_name='US')
m.fit(df)
Listing 4-3

Develop Prophet Model

Create the Future Data Frame

Listing 4-4 applies the make_future_dataframe() method to create the future dataframe.
future = m.make_future_dataframe(periods=365)
Listing 4-4

Create a Future Data Frame Constrained to 365 Days

Forecast

Listing 4-5 applies the predict() method to forecast future instances.
forecast = m.predict(future)
Listing 4-5

Forecast Time Series

Listing 4-6 plots previous values and forecasted values (see Figure 4-1).
m.plot(forecast)
plt.xlabel("Date")
plt.ylabel("Price")
plt.xticks(rotation=45)
plt.show()
Listing 4-6

Forecast

../images/506148_1_En_4_Chapter/506148_1_En_4_Fig1_HTML.jpg
Figure 4-1

Forecast

Figure 4-1 tacitly agrees with the SARIMAX in the previous chapter; however, it provides more details. It forecasts a long-run bullish trend in the first quarters of the year 2021.

Seasonal Components

Listing 4-7 decomposes the series (see Figure 4-2).
model.plot_components(forecast)
plt.show()
Listing 4-7

Seasonal Components

../images/506148_1_En_4_Chapter/506148_1_En_4_Fig2_HTML.jpg
Figure 4-2

Seasonal components

Cross-Validate the Model

Cross validation tests model performance. In the background, Prophet finds forecast errors of historical data. It sets the cutoff point and only data up to that specific cutoff point. Listing 4-8 applies the cross_validation() method with a specified initial starting point of the training data (initial), forecast horizon (horizon), and space between cutting off points (period). See Table 4-4.
from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(model, initial="210 days",period="15 days", horizon="70 days") df_cv
Listing 4-8

Cross Validation

Table 4-4

Cross-Validation Table

 

ds

yhat

yhat_lower

yhat_upper

y

cutoff

0

2020-02-10

14.706377

14.527866

14.862567

15.071000

2020-02-09

1

2020-02-11

14.594084

14.429592

14.753000

14.953500

2020-02-09

2

2020-02-12

14.448900

14.283988

14.625561

14.791600

2020-02-09

3

2020-02-13

14.258331

14.094124

14.428341

14.865500

2020-02-09

4

2020-02-14

14.028495

13.858400

14.204660

14.905000

2020-02-09

...

...

...

...

...

...

...

295

2020-06-29

19.919252

19.620802

20.193089

17.298901

2020-04-24

296

2020-06-30

19.951939

19.660207

20.233733

17.219200

2020-04-24

297

2020-07-01

19.966822

19.665303

20.250541

17.341900

2020-04-24

298

2020-07-02

20.012227

19.725297

20.301380

17.039301

2020-04-24

299

2020-07-03

20.049481

19.752089

20.347799

17.037100

2020-04-24

Evaluate the Model

Listing 4-9 applies the performance_metrics() method to return key evaluation metrics (see Table 4-5).
from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p
Listing 4-9

Performance

Table 4-5

Performance Metrics

 

horizon

mse

Rmse

mae

mape

mdape

coverage

0

7 days

0.682286

0.826006

0.672591

0.038865

0.033701

0.166667

1

8 days

1.145452

1.070258

0.888658

0.051701

0.047487

0.100000

2

9 days

1.557723

1.248088

1.077183

0.063369

0.056721

0.033333

3

10 days

2.141915

1.463528

1.299301

0.077282

0.066508

0.033333

4

11 days

3.547450

1.883468

1.648495

0.097855

0.086243

0.000000

...

...

...

...

...

...

...

...

59

66 days

187.260756

13.684325

9.480800

0.549271

0.244816

0.000000

60

67 days

157.856975

12.564115

8.915581

0.515198

0.244816

0.000000

61

68 days

137.029889

11.705977

8.692623

0.499211

0.253436

0.000000

62

69 days

116.105651

10.775233

8.146737

0.466255

0.252308

0.000000

63

70 days

96.738282

9.835562

7.483452

0.427025

0.227503

0.000000

Conclusion

This chapter covers the generalized additive model. The model takes seasonality into account and uses time as a regressor. Its performance surpasses that of the seasonal ARIMA model. The model commits minor errors when forecasting future instances of the series. We can rely on the Prophet model to forecast a time series.

The first four chapters of this book properly introduce the parametric method. This method makes bold assumptions about the underlying structure of the data. It assumes the underlying structure of the data is linear.

The subsequent chapter introduces the nonparametric method. This method supports flexible assumptions about the underlying structure of the data. It assumes the underlying structure of the data is nonlinear.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.231.155