Decomposition

There are two things to note about the previous screenshot:

  • CO2 levels in the air are steadily rising over time.
  • There are dips and then bumps in the levels of CO2, but the result still ends up rising overall. These dips and bumps happen on a regular pattern.

The first point is what is known to statisticians as a trend. You may already be familiar with the notion of a Trend Line from Microsoft Excel. A trend is a kind of pattern that describes gradual change over time. In our case, it is quite clear that the trend is upward.

The second point is called seasonality—for very apt reasons, as it may turn out. Seasonality describes the pattern of variance that happens regularly. If you carefully look at the chart, typically at around August to October of each year, the CO2 levels drop to the lowest point of the year. After which, they rise steadily again until around May, where they peak. Here's a good hint as to why this happens: plants suck CO2 from the air through a process called photosynthesis. Photosynthesis requires a organelle in a plant's cell called a chloroplast, which contains a green pigment called chlorophyll. If you live in the Northern Hemisphere, you would be well aware that trees are greenest from Spring till Autumn. This largely coincides with the period from May till October. The changing of seasons cause a change in atmospheric carbon dioxide levels. You can certainly see why the term "seasonality" is quite apt.

A good question to ask  might be this: Can we separate the trend out from the seasonality so that we may be able to work on each component individually? The answer is yes, we can. In fact, in the remaining parts of this section, I'll show how to do so.

Now, as to why you would want to do that, well, in our project so far, we've seen seasonalities that are affected by real-life calendar seasons. Imagine you were doing statistical analysis for a toy company in a Western country. You'd see a yearly spike around Christmas time. Often seasonality adds noise to our analysis—it's hard to tell whether a bump in sales was due to Christmas time or an actual increase in sales. Furthermore, there are some cycles that don't necessarily follow the calendar year. If you are dealing with sales in a largely Chinese/Vietnamese community, you'd see spikes in sales before Chinese New Year/Tet. Those do not follow our calendar year. Ditto, if you were in the dates industry—you'd see spikes around Ramadan as demand for dates increases sharply during the Muslim fasting period.

While it's true that most time series would have some kind of trend and seasonality component, it would be remiss for me to mention that not all trends and seasonalities are particularly useful. You might be tempted to take what you learn in this chapter and apply it on the stock markets but buyer beware! Analyzing complex market places is quite different from analyzing trends of CO2 in the air or sales from a business. The fundamental properties of time series in markets are somewhat different—it's a process that has the Markov property, which is best described as past performance does not indicate future performance. By contrast, we shall see, for this project, that the past is quite well correlated with the present and the future.

But back to the topic at hand—decomposition. If you read the comments on the data file (the lines we skipped from importing), the following is mentioned:

"First, we compute for each month the average seasonal cycle in a 7-year window around each monthly value. In this way, the seasonal cycle is allowed to change slowly over time. We then determine the "trend" value for each month by removing the seasonal cycle; this result is shown in the "trend" column."
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.89.2