The preceding chapter covered seasonal ARIMA. After all the considerable effort in data preprocessing and hyperparameter optimization, the model generates considerable errors when forecasting future instances of the series. For a fast and automated forecasting procedure, use Facebook’s Prophet; it forecasts time-series data based on nonlinear trends with seasonality and holiday effects. This chapter introduces Prophet and presents a way of developing and testing an additive model. First, it discusses the crucial difference between the Statsmodels package and the Prophet package.
The Difference Between Statsmodels and Prophet
Time-series analysis typically requires missing values and outliers diagnosis and treatment, multiple test statistics to verify key assumptions, hyperparameter optimization, and seasonal effects control. If we commit a slight mistake in the workflow, then the model will make significant mistakes when forecasting future instances. Consequently, building a model using Statsmodels reasonably requires a certain level of mastery.
Fondly remember that machine learning is about inducing a computer with intelligence using minimal code. Statsmodels does not offer us that. Prophet fills that gap. It was developed by Facebook’s Core Data Science team. It performs tasks such as missing value and outlier detection, hyperparameter optimization, and control of seasonality and holiday effects. To install FB Prophet in the Python environment, use pip install fbprophet. To install it in the Conda environment, use conda install -c conda-forge fbprophet.
Understand Prophet Components
In the previous chapter, we painstakingly built a time-series model with three components, namely, trend, seasonality, and irregular components.
Prophet Components
The Prophet package sufficiently takes into account the trend, seasonality and holidays, and events. In this section, we will briefly discuss the key components of the package.
Trend
Tunable Trend Parameters
Parameters | Description |
---|---|
growth | Specifies a piecewise linear or logistic trend |
n_changepoints | The number of changes to be automatically included, if changepoints are not specified |
change_prior_scale | The change of automatic changepoint selection |
Changepoints
A changepoint is a penalty term. It alters the behavior of a time-series model. If you do not specify changepoints, the default changepoint value is automated.
Seasonality
Here, P is the period (7 for weekly data, 30 for monthly data, 90 for quarterly data, and 365 for yearly data).
Holidays and Events Parameters
Holiday and Event Parameters
Parameters | Description |
---|---|
daily_seasonality | Fit daily seasonality |
weekly_seasonality | Fit weekly seasonality |
year_seasonality | Fit yearly seasonality |
holidays | Include holiday name and date |
yeasonality_prior_scale | Determine strength of needed for seasonal or holiday components |
The Additive Model
Here, g(t) represents the linear or logistic growth curve for modeling changes that are not periodic, s(t) represents the periodic changes (daily, weekly, yearly seasonality), h(t) represents the effects of holidays, and + ?? represents the error term that considers unusual changes.
Data Preprocessing
df["ds"], which repurposes time
df["y"], which repurposes the independent variable
df.set_index(""), which sets the date and time as the index column
Process Data
Dataset
Close | ds | y | |
---|---|---|---|
Date | |||
2019-07-03 | 14.074300 | 2019-07-03 | 14.074300 |
2019-07-04 | 14.052900 | 2019-07-04 | 14.052900 |
2019-07-05 | 14.038500 | 2019-07-05 | 14.038500 |
2019-07-08 | 14.195200 | 2019-07-08 | 14.195200 |
2019-07-09 | 14.179500 | 2019-07-09 | 14.179500 |
... | ... | ... | ... |
2020-06-29 | 17.298901 | 2020-06-29 | 17.298901 |
2020-06-30 | 17.219200 | 2020-06-30 | 17.219200 |
2020-07-01 | 17.341900 | 2020-07-01 | 17.341900 |
2020-07-02 | 17.039301 | 2020-07-02 | 17.039301 |
2020-07-03 | 17.037100 | 2020-07-03 | 17.037100 |
Develop the Prophet Model
Specify Holidays
Develop Prophet Model
Create the Future Data Frame
Create a Future Data Frame Constrained to 365 Days
Forecast
Forecast Time Series
Forecast
Figure 4-1 tacitly agrees with the SARIMAX in the previous chapter; however, it provides more details. It forecasts a long-run bullish trend in the first quarters of the year 2021.
Seasonal Components
Cross-Validate the Model
Cross Validation
Cross-Validation Table
ds | yhat | yhat_lower | yhat_upper | y | cutoff | |
---|---|---|---|---|---|---|
0 | 2020-02-10 | 14.706377 | 14.527866 | 14.862567 | 15.071000 | 2020-02-09 |
1 | 2020-02-11 | 14.594084 | 14.429592 | 14.753000 | 14.953500 | 2020-02-09 |
2 | 2020-02-12 | 14.448900 | 14.283988 | 14.625561 | 14.791600 | 2020-02-09 |
3 | 2020-02-13 | 14.258331 | 14.094124 | 14.428341 | 14.865500 | 2020-02-09 |
4 | 2020-02-14 | 14.028495 | 13.858400 | 14.204660 | 14.905000 | 2020-02-09 |
... | ... | ... | ... | ... | ... | ... |
295 | 2020-06-29 | 19.919252 | 19.620802 | 20.193089 | 17.298901 | 2020-04-24 |
296 | 2020-06-30 | 19.951939 | 19.660207 | 20.233733 | 17.219200 | 2020-04-24 |
297 | 2020-07-01 | 19.966822 | 19.665303 | 20.250541 | 17.341900 | 2020-04-24 |
298 | 2020-07-02 | 20.012227 | 19.725297 | 20.301380 | 17.039301 | 2020-04-24 |
299 | 2020-07-03 | 20.049481 | 19.752089 | 20.347799 | 17.037100 | 2020-04-24 |
Evaluate the Model
Performance
Performance Metrics
horizon | mse | Rmse | mae | mape | mdape | coverage | |
---|---|---|---|---|---|---|---|
0 | 7 days | 0.682286 | 0.826006 | 0.672591 | 0.038865 | 0.033701 | 0.166667 |
1 | 8 days | 1.145452 | 1.070258 | 0.888658 | 0.051701 | 0.047487 | 0.100000 |
2 | 9 days | 1.557723 | 1.248088 | 1.077183 | 0.063369 | 0.056721 | 0.033333 |
3 | 10 days | 2.141915 | 1.463528 | 1.299301 | 0.077282 | 0.066508 | 0.033333 |
4 | 11 days | 3.547450 | 1.883468 | 1.648495 | 0.097855 | 0.086243 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... |
59 | 66 days | 187.260756 | 13.684325 | 9.480800 | 0.549271 | 0.244816 | 0.000000 |
60 | 67 days | 157.856975 | 12.564115 | 8.915581 | 0.515198 | 0.244816 | 0.000000 |
61 | 68 days | 137.029889 | 11.705977 | 8.692623 | 0.499211 | 0.253436 | 0.000000 |
62 | 69 days | 116.105651 | 10.775233 | 8.146737 | 0.466255 | 0.252308 | 0.000000 |
63 | 70 days | 96.738282 | 9.835562 | 7.483452 | 0.427025 | 0.227503 | 0.000000 |
Conclusion
This chapter covers the generalized additive model. The model takes seasonality into account and uses time as a regressor. Its performance surpasses that of the seasonal ARIMA model. The model commits minor errors when forecasting future instances of the series. We can rely on the Prophet model to forecast a time series.
The first four chapters of this book properly introduce the parametric method. This method makes bold assumptions about the underlying structure of the data. It assumes the underlying structure of the data is linear.
The subsequent chapter introduces the nonparametric method. This method supports flexible assumptions about the underlying structure of the data. It assumes the underlying structure of the data is nonlinear.