Devising a persistence strategy

Considering the structure of data and how it will be used later, may inform us about how we write it now. We know that we have an infinite sequence of data that we need to write. In the software profession, these are known as data streams. They can be in the form of video feeds from CCTV cameras or social media content, or in our case, a continuously updating stock price. This means that we will never be able to collect all the data in memory and then write it out. Knowing this, we need to devise a strategy to collect and save our data in a manageable way.

In fact, input and output streams are conceptual ways of handling an infinite stream of data. Clojure leverages Java's streaming API in the clojure.java.io namespace (you can read more at http://clojuredocs.org/clojure.java.io). We're going to itemize the operations we need to perform in order to decide whether to use an approach that directly spits out the contents of some data versus incrementally streaming out this data.

To begin, we need to store each tick progression incrementally so that it can be looked up later on. These are the operations and data components that are relevant:

  • Storing a raw tick list
  • Storing a simple moving average
  • Storing a exponential moving average
  • Storing a bollinger band
  • Relating them by time

Streaming new data to an existing file nominally means appending to the end of this file. For example, evaluating (spit "datafile.edn" '(:more :stuff) :append true) will append to the end of the previous file, producing the following content (:foo :bar)(:more :stuff). The format, as we've just seen, won't work for our purposes. We'd want to see something, such as (:foo :bar :more :stuff), which would involve the following:

  • Opening up an existing file
  • Reading its content
  • Appending to the content
  • Writing out the content again

This is not an efficient use of our computer's resources, given that this update may take place every second or whenever a stock price's data gets updated.

Given the frequency of the data updates, it makes more sense to collect a set of data and write this batch out to a file. Deciding the threshold at which to write out a batch of data should entail its own analysis. This would involve knowing how often a data update would take place, the size of our program's memory, where this data will be used later, and so on. For the sake of simplicity, however, I'll randomly pick a list size of 320 ticks at which we can flush out the content of our data. I'll also separately write out each data analytic (raw tick list, simple moving average, and so on). This separation is possible because the analytics can later be related to time (the EDN #inst tag). In this way, we can arrive at the following features:

  • Save data as separate size-manageable files
  • Each analytic is easily distinguished without any extra labeling; this will be necessary if they were comingled in the same file

These features will also be useful when later looking up our data. We should be able to do the following:

  • Look up a price based on a specific period of time
  • Look up a price based on a time range
  • Look up a data point based on a specific price
  • Look up a data point based on a price range

With our data neatly arranged, we should be able to layer additional analytics on top of it. For example, by looking at the composition of our source tick list, simple moving average, exponential moving average, and bollinger band, we can start to derive some buy and sell signals.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.47.166