Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4. High-Quality Time-Series Analysis

Tshepo Chris Nokeri¹

(1)

Pretoria, South Africa

The preceding chapter covered seasonal ARIMA. After all the considerable effort in data preprocessing and hyperparameter optimization, the model generates considerable errors when forecasting future instances of the series. For a fast and automated forecasting procedure, use Facebook’s Prophet; it forecasts time-series data based on nonlinear trends with seasonality and holiday effects. This chapter introduces Prophet and presents a way of developing and testing an additive model. First, it discusses the crucial difference between the Statsmodels package and the Prophet package.

The Difference Between Statsmodels and Prophet

Time-series analysis typically requires missing values and outliers diagnosis and treatment, multiple test statistics to verify key assumptions, hyperparameter optimization, and seasonal effects control. If we commit a slight mistake in the workflow, then the model will make significant mistakes when forecasting future instances. Consequently, building a model using Statsmodels reasonably requires a certain level of mastery.

Fondly remember that machine learning is about inducing a computer with intelligence using minimal code. Statsmodels does not offer us that. Prophet fills that gap. It was developed by Facebook’s Core Data Science team. It performs tasks such as missing value and outlier detection, hyperparameter optimization, and control of seasonality and holiday effects. To install FB Prophet in the Python environment, use pip install fbprophet. To install it in the Conda environment, use conda install -c conda-forge fbprophet.

Understand Prophet Components

In the previous chapter, we painstakingly built a time-series model with three components, namely, trend, seasonality, and irregular components.

Prophet Components

The Prophet package sufficiently takes into account the trend, seasonality and holidays, and events. In this section, we will briefly discuss the key components of the package.

Trend

A trend is a single and consistent directional movement (upward or downward). We fit a time-series model to discover a trend. Table 4-1 highlights tunable trend parameters.

Table 4-1

Tunable Trend Parameters

Parameters	Description
growth	Specifies a piecewise linear or logistic trend
n_changepoints	The number of changes to be automatically included, if changepoints are not specified
change_prior_scale	The change of automatic changepoint selection

Changepoints

A changepoint is a penalty term. It alters the behavior of a time-series model. If you do not specify changepoints, the default changepoint value is automated.

Seasonality

Seasonality represents consistent year-to-year upward or downward movements. Equation 4-1 approximates this.

$S(t)={sum}_{n=1}^N{alpha}_n coscos left(frac{2pi nt}{P} ight)+{b}_nmathit{cos}left(frac{2pi nt}{P} ight)$

(Equation 4-1)

Here, P is the period (7 for weekly data, 30 for monthly data, 90 for quarterly data, and 365 for yearly data).

Holidays and Events Parameters

Holidays and events affect the time series’ conditional mean. Table 4-2 outlines key tunable holiday and event parameters.

Table 4-2

Holiday and Event Parameters

Parameters	Description
daily_seasonality	Fit daily seasonality
weekly_seasonality	Fit weekly seasonality
year_seasonality	Fit yearly seasonality
holidays	Include holiday name and date
yeasonality_prior_scale	Determine strength of needed for seasonal or holiday components

The Additive Model

The additive models assume that the trend and cycle are treated as one term. Its components are similar to the Holt-Winters technique. We express the equation as shown in Equation 4-2.

(Equation 4-2)

The formula is written mathematically as shown in Equation 4-3.

$y=g(t)+s(t)+h(t)+{varepsilon}_i$

(Equation 4-3)

Here, g(t) represents the linear or logistic growth curve for modeling changes that are not periodic, s(t) represents the periodic changes (daily, weekly, yearly seasonality), h(t) represents the effects of holidays, and + ?_? represents the error term that considers unusual changes.

Data Preprocessing

We obtained the example data from Yahoo Finance.¹ Before training the model, repurpose the data as follows:

df["ds"], which repurposes time
df["y"], which repurposes the independent variable
df.set_index(""), which sets the date and time as the index column

Listing 4-1 sets column names to the right format.

df["ds"] = df["Date"]

df["y"] = df["Close"]

df.set_index("Date")

Listing 4-1

Process Data

Table 4-3 shows how a dataframe should look like before developing a time-series model using Prophet.

Table 4-3

Dataset

	Close	ds	y
Date
2019-07-03	14.074300	2019-07-03	14.074300
2019-07-04	14.052900	2019-07-04	14.052900
2019-07-05	14.038500	2019-07-05	14.038500
2019-07-08	14.195200	2019-07-08	14.195200
2019-07-09	14.179500	2019-07-09	14.179500
...	...	...	...
2020-06-29	17.298901	2020-06-29	17.298901
2020-06-30	17.219200	2020-06-30	17.219200
2020-07-01	17.341900	2020-07-01	17.341900
2020-07-02	17.039301	2020-07-02	17.039301
2020-07-03	17.037100	2020-07-03	17.037100

Develop the Prophet Model

Listing 4-2 specifies the official holidays. Listing 4-3 configures the model with a confidence interval to 95%, and it considers yearly seasonality, weekly seasonality, and daily seasonality.

holidays = pd.DataFrame({

'holiday': 'playoff',

'ds': pd.to_datetime(["2020-12-25", "2020-12-24", "2020-12-23", "2019-12-25", "2021-01-01", "2021-01-20"]),

"lower_window": 0,

"upper_window": 1,

})

Listing 4-2

Specify Holidays

Listing 4-3 completes the model.

m = Prophet(holidays=holidays,

interval_width=0.95,

yearly_seasonality=True,

weekly_seasonality=True,

daily_seasonality=True,

changepoint_prior_scale=0.095)

m.add_country_holidays(country_name='US')

m.fit(df)

Listing 4-3

Develop Prophet Model

Create the Future Data Frame

Listing 4-4 applies the make_future_dataframe() method to create the future dataframe.

future = m.make_future_dataframe(periods=365)

Listing 4-4

Create a Future Data Frame Constrained to 365 Days

Forecast

Listing 4-5 applies the predict() method to forecast future instances.

forecast = m.predict(future)

Listing 4-5

Forecast Time Series

Listing 4-6 plots previous values and forecasted values (see Figure 4-1).

m.plot(forecast)

plt.xlabel("Date")

plt.ylabel("Price")

plt.xticks(rotation=45)

plt.show()

Listing 4-6

Forecast

../images/506148_1_En_4_Chapter/506148_1_En_4_Fig1_HTML.jpg — Figure 4-1
Forecast

Figure 4-1 tacitly agrees with the SARIMAX in the previous chapter; however, it provides more details. It forecasts a long-run bullish trend in the first quarters of the year 2021.

Seasonal Components

Listing 4-7 decomposes the series (see Figure 4-2).

model.plot_components(forecast)

plt.show()

Listing 4-7

Seasonal Components

../images/506148_1_En_4_Chapter/506148_1_En_4_Fig2_HTML.jpg — Figure 4-2
Seasonal components

Cross-Validate the Model

Cross validation tests model performance. In the background, Prophet finds forecast errors of historical data. It sets the cutoff point and only data up to that specific cutoff point. Listing 4-8 applies the cross_validation() method with a specified initial starting point of the training data (initial), forecast horizon (horizon), and space between cutting off points (period). See Table 4-4.

from fbprophet.diagnostics import cross_validation

df_cv = cross_validation(model, initial="210 days",period="15 days", horizon="70 days") df_cv

Listing 4-8

Cross Validation

Table 4-4

Cross-Validation Table

	ds	yhat	yhat_lower	yhat_upper	y	cutoff
0	2020-02-10	14.706377	14.527866	14.862567	15.071000	2020-02-09
1	2020-02-11	14.594084	14.429592	14.753000	14.953500	2020-02-09
2	2020-02-12	14.448900	14.283988	14.625561	14.791600	2020-02-09
3	2020-02-13	14.258331	14.094124	14.428341	14.865500	2020-02-09
4	2020-02-14	14.028495	13.858400	14.204660	14.905000	2020-02-09
...	...	...	...	...	...	...
295	2020-06-29	19.919252	19.620802	20.193089	17.298901	2020-04-24
296	2020-06-30	19.951939	19.660207	20.233733	17.219200	2020-04-24
297	2020-07-01	19.966822	19.665303	20.250541	17.341900	2020-04-24
298	2020-07-02	20.012227	19.725297	20.301380	17.039301	2020-04-24
299	2020-07-03	20.049481	19.752089	20.347799	17.037100	2020-04-24

Evaluate the Model

Listing 4-9 applies the performance_metrics() method to return key evaluation metrics (see Table 4-5).

from fbprophet.diagnostics import performance_metrics

df_p = performance_metrics(df_cv)

df_p

Listing 4-9

Performance

Table 4-5

Performance Metrics

	horizon	mse	Rmse	mae	mape	mdape	coverage
0	7 days	0.682286	0.826006	0.672591	0.038865	0.033701	0.166667
1	8 days	1.145452	1.070258	0.888658	0.051701	0.047487	0.100000
2	9 days	1.557723	1.248088	1.077183	0.063369	0.056721	0.033333
3	10 days	2.141915	1.463528	1.299301	0.077282	0.066508	0.033333
4	11 days	3.547450	1.883468	1.648495	0.097855	0.086243	0.000000
...	...	...	...	...	...	...	...
59	66 days	187.260756	13.684325	9.480800	0.549271	0.244816	0.000000
60	67 days	157.856975	12.564115	8.915581	0.515198	0.244816	0.000000
61	68 days	137.029889	11.705977	8.692623	0.499211	0.253436	0.000000
62	69 days	116.105651	10.775233	8.146737	0.466255	0.252308	0.000000
63	70 days	96.738282	9.835562	7.483452	0.427025	0.227503	0.000000

Conclusion

This chapter covers the generalized additive model. The model takes seasonality into account and uses time as a regressor. Its performance surpasses that of the seasonal ARIMA model. The model commits minor errors when forecasting future instances of the series. We can rely on the Prophet model to forecast a time series.

The first four chapters of this book properly introduce the parametric method. This method makes bold assumptions about the underlying structure of the data. It assumes the underlying structure of the data is linear.

The subsequent chapter introduces the nonparametric method. This method supports flexible assumptions about the underlying structure of the data. It assumes the underlying structure of the data is nonlinear.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. High-Quality Time-Series Analysis

Create new playlist

Sign In

Sign Up

4. High-Quality Time-Series Analysis

The Difference Between Statsmodels and Prophet

Understand Prophet Components

Prophet Components

Trend

Changepoints

Seasonality

Holidays and Events Parameters

The Additive Model

Data Preprocessing

Develop the Prophet Model

Create the Future Data Frame

Forecast

Seasonal Components

Cross-Validate the Model

Evaluate the Model

Conclusion

Table of Contents for
4. High-Quality Time-Series Analysis