Using STL

To recap, there are two important parts that are fundamental to the STL algorithm:

  • The width used for smoothing
  • The periods in the dataset

When we look at the CO2 dataset, we can count the periods by counting the number of peaks in the chart. I counted 60 peaks. This corresponds to the fact that the observatory has been collecting data for the past 60 years.

From here, we move from the hard sciences of statistics into the softer realms of interpretation. This is often true in data science and machine learning—we often have to use our intuition to guide us.

In this case, we have a hard starting point: there has been 60 years so we expect at least 60 periods. Another starting point can be found in the notes of the dataset itself: the NOAA uses a seven-year window to calculate the seasonal component. I don't see any reason to not use those values. So, let's decompose our time series into the trend, seasonal, and residual components.

But before we begin, there is an additional note to make: we want to decompose the time series into three components, but how do these three components recompose to become whole again? In general, there are two methods: additive or multiplicative. Simply put, we can decompose the data as either one of the following equations:

This can also be stated as follows:

The github.com/chewxy/stl package supports both models, and even supports custom models that fall "in-between" additive and multiplicative models.


When to use an additive model
: Use an additive model when the seasonality does not vary with the level of the time series. Most standard business case time series fall in this category.

When to use a multiplicative model: Use a multiplicative model when the seasonality or trend does vary with the level of the time series. Most econometric models fall in this category.

For the purpose of this project, we will be using an additive model. Here's the main function again:

func main() {
dateStrings, co2s := parse(readFromFile)
dates := parseDates(dateStrings)
plt := newTSPlot(dates, co2s, "CO2 Level")
plt.X.Label.Text = "Time"
plt.Y.Label.Text = "CO2 in the atmosphere (ppm)"
plt.Title.Text = "CO2 in the atmosphere (ppm) over time Taken over the Mauna-Loa observatory"
dieIfErr(plt.Save(25*vg.Centimeter, 25*vg.Centimeter, "Moana-Loa.png"))

decomposed := stl.Decompose(co2s, 12, 84, stl.Additive(), stl.WithIter(1))
dieIfErr(decomposed.Err)
plts := plotDecomposed(dates, decomposed)
writeToPng(plts, "decomposed.png", 25, 25)
}

Let's break this down; in particular, the parameters:

decomposed := stl.Decompose(co2s, 12, 84, stl.Additive(), stl.WithIter(1))

Take a look at the following terms from the preceding code:

  • 12: We counted 60 periods. The data is monthly data; therefore, it would make sense that a period takes 12 months, or as we know it—a year.
  •  84: We use the smoothing window as specified by the NOAA. Seven years is 84 months.
  •  stl.Additive(): We want to use an additive model.
  • stl.WithIter(1): STL is fairly sensitive to the number of iterations run. The default is 2. But if you run it too many times, everything gets iteratively "smoothed" out. So, instead, we stick with 1.

In the following sections, I'll show examples of misuse and why despite everything, 1 and 2 are still pretty good iteration counts.

You may note that instead of specifying the number of periods, we specified the length of a period. The package expects the data to be evenly spaced—the distance between any two rows should be the same.

Running this yields the following plot:

The first chart is the original data, followed by the extracted trend and seasonality, and, finally, the residuals. There remains some weirdness with regards to the beginning of the graph, but that artifact is solely due to the fact that the github.com/chewxy/stl library does not "backcast". Hence, it's always a good idea to start with at least one extra period.

How to interpret the plot? Well, since this is an additive model, interpretation is a lot simpler—the Y values indicate the ppm of carbon dioxide in the air that each component contributes to the actual data, so the first chart is literally the result of adding the bottom charts together.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.43.26