Chapter 12. TrendCalculus

Long before the concept of what's trending became a popular topic of study by data scientists, there was an older one that is still not well served by data science: it is that of Trends. Presently, the analysis of trends, if it can be called that, is primarily carried out by people "eyeballing" time series charts and offering interpretations. But what is it that people's eyes are doing?

This chapter describes an implementation in Apache Spark of a new algorithm for studying trends numerically, called TrendCalculus, invented by Andrew Morgan. The original reference implementation is written in the Lua language and was open-sourced in 2015, the code can be viewed at https://bitbucket.org/bytesumo/trendcalculus-public.

This chapter explains the core method, which delivers the fast extraction of trend change points on a time series; these are the moments when trends change direction. We will describe our TrendCalculus algorithm in detail while implementing it in Apache Spark. The result is a set of scalable functions to quickly compare trends across time series, to make inferences about trends and examine correlation across timeframes. Using these disruptive new methods, we demonstrate how to construct a causal ranking technique to extract potential causal models from across the thousands of time series inputs.

In this chapter we will learn:

  • How to construct time windowed summary data efficiently
  • How to effectively summarize time series data to reduce noise, for further trend studies
  • How to extract trend reversal change points from the summary data using the new TrendCalculus algorithm
  • How to create User Defined Aggregate Functions (UDAFs) that operate on partitions created by complex window functionality as well as more common group by methods
  • How to return multiple values from UDAFs
  • How to use lag functions to compare current and previous records

When presented with a problem, amongst the first hypotheses that data scientists consider are those related to trends; trends are an excellent way to provide a visualization of data and lend themselves particularly well to large datasets, where the general direction of change of the data can often be seen. In Chapter 5, Spark for Geographic Analysis, we produced a simple algorithm to attempt to predict the price of crude oil. In that study, we concentrated on the direction of change in the price, that is, by definition the trend of the price. We see that trends are a natural way to think, explain, and forecast.

To explain and demonstrate our new trend methods, this chapter is organized into two sections. The first is technical, to deliver the code we need to execute our new algorithm. The second section is about the application of that method on real data. We hope it demonstrates that the apparent simplicity of trends as a concept can often be more complicated to calculate than we may have first thought, particularly in the presence of noise. Noise results in many local highs and lows (referred to as jitter in this chapter), which can make finding trend turning points and discovering the general direction of change over time difficult to determine. Ignoring noise in time series, and extracting interpretable trend signals, provides the central challenges we demonstrate how to overcome.

Studying trends

The dictionary definition of trend is a general direction in which something is developing or changing, but there are other more focused definitions that might be more helpful for guiding data science. Two such definitions are from Salomé Areias, who studies social trends, and Eurostat, the official statistical agency in the European Union:

"A trend is the slow variation over a longer period of time, usually several years, generally associated with the structural causes affecting the phenomenon being measured." - EUROSTAT, official statistical agency in the European Union (http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Trend)

"A Trend is defined by a shift in behavior or mentality that influences a significant amount of people." - Salomé Areias, social trend commentator (https://salomeareias.wordpress.com/what-is-a-trend/)

We generally think of trends as nothing more than a long rise or fall in stock market prices. However, trends can also refer to many other use cases that relate to economics, politics, popular culture, and society: for example, the study of sentiments revealed by media outlets when they report on the news. In this chapter, we will use the price of oil as a simple demonstration; however, the technique could be applied to any data where trends occur in the following manner:

  • Rising trends: When successive peaks and troughs are higher (higher highs and higher lows), referred to as an upward or rising trend. For example, the first arrow in the following diagram is the result of a series of peaks and troughs where the overall effect is an increase.
  • Falling trends: When successive peaks and troughs are lower (lower highs and lower lows), referred to as a downward or falling trend. For example, the second arrow in the following diagram is the result of a series of peaks and troughs where the overall effect is a decrease.
  • Horizontal trends: This is not strictly a trend on its own, but a lack of a well-defined trend in either direction. We are not specifically concerned with this at this stage, but it is discussed later in the chapter.

Studying trends

Note

If you search for "higher highs" "higher lows" "trend" "lower highs" "lower lows" you will see over 16,000 hits including many high profile financial sites. This is a standard practice, rule of thumb definition of a trend in the finance industry.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.47.208