Functions for querying a system

If you've followed the progress from the last chapter, per data chunk, you would have written out four kinds of files that represent all the data we've generated:

  <file-name>-tick.edn
  <file-name>-sma.edn
  <file-name>-ema.edn
  <file-name>-bol.edn

Since we've named the files appropriately in our list of many-files, we can use the filter function to isolate on any condition—in this case, we'll isolate only one type of file.

An example of a regular expression

For our predicate function, we can use a regular expression to search for a string pattern in a filename. Clojure represents regular expressions with a string prefixed by a hash such as #"". So, the re-matches function, for example, takes a regular expression and string and returns the first match it finds, if any:

(re-matches #"x" "foobar")
;; nil

(re-matches #"foo.*" "foobar")
;; "foobar"

The returned value (or the absence of one, which is nil) is considered truthy, which just means that we can use it in an if condition or a predicate. So, our filter function will look something like this:

;; filter on the type of file
(filter #(re-matches #".*-ticks.edn" (.getName %))
        many-files)

Here, we filter over our many-files result list. The predicate is an anonymous function that checks a condition by calling re-matches on each element in the list using the #".*-ticks.edn" name pattern. Each element happens to be a Java java.io.File class with the getName method. Therefore, any filename that matches our name pattern will satisfy the filter.

A basic lookup

Our re-matches expression is just one kind of a predicate that filters on a type of file. We can also filter the content in files. This time, however, let's pick a particular time instance that you've saved. My filesystem has #inst 2015-08-15T17:18:51.352-00:00. With regard to the following code, we can understand it by first looking at the outer map function, which itself takes a mapping function, and a list of files. This input list is filtered due to the constraint of being a file—no directories are allowed. Therefore, it makes sense to look at the inner filter function. It takes a (pred-fn) predicate function and inputs EDN data, for example, (input-edn). The predicate function qualifies that the :last-trade-time value in each map is exactly the time instance of #inst "2015-08-15T17:18:51.352-00:00". However, input-edn is a little more interesting. Each file that's passed to the mapping function calls our created read-fn. The read-fn function is another example of how the comp function is used. Recall that comp simply creates a function by composing other functions together. So, in this case, read-fn is a composition of edn/read-string and slurp. When used, the input to read-fn will first go to slurp, the results of which are then chained to edn/read-string:

(map (fn [each-file]
       (let [read-fn (comp edn/read-string slurp)   ;; -> use of comp function
             pred-fn #(= #inst "2015-08-15T17:18:51.352-00:00" (:last-trade-time %))
             inp-edn (read-fn each-file)]

         (filter pred-fn
                 inp-edn)))

     (filter #(.isFile %) many-files))


'(({:last-trade-time #inst "2015-08-15T17:18:51.352-00:00",
   :last-trade-price 101.90402553222992}
  {:last-trade-time #inst "2015-08-15T17:18:51.352-00:00",
   :last-trade-price 101.63143059175933})
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ())

Running this expression on all my files shows that only two maps have this instant. This is to be expected since the time constraint is so specific. However, map will return an empty list for each element where (filter pred-fn inp-edn) doesn't match and return anything. So, the output format is not easy to look at.

Flattening structures

We can clean up the preceding nested expression by simply flattening the result list. This can again easily be done with the flatten function. It takes a nested combination of sequential data structures (such as lists, vectors, and so on) and gives back content in a single flat sequence, removing empty sequences and maintaining the shape of associative structures:

(flatten '({:a {:b {:c 1}}}
           {:g :h}
          ({:e :R})))

;; ({:a {:b {:c 1}}} {:g :h} {:e :R})


(flatten '({:a {:b {:c 1}}}
          ({:g :h}
          ({:e :R}))))
;; ({:a {:b {:c 1}}} {:g :h} {:e :R})

The following code is the same as the expression we encountered earlier, but with its results flattened and wrapped in a function definition. In your REPL, define this function, replacing the string to the right-hand side of #inst along with a date format that was written to your filesystem. After creating the function, simply evaluate it by calling (lookupfn):

(defn lookupfn []

  (flatten
   (map (fn [x]
          (let [read-fn (comp edn/read-string slurp)
                pred-fn #(= #inst "2015-08-15T17:18:51.352-00:00"                 (:last-trade-time %))
                inp-edn (read-fn x)]

            (filter pred-fn
                    inp-edn)))

        (filter #(.isFile %) many-files))))

(lookupfn)

;; ({:last-trade-time #inst "2015-08-15T17:18:51.352-00:00", :last-trade-price 101.90402553222992} {:last-trade-time #inst "2015-08-15T17:18:51.352-00:00", :last-trade-price 101.63143059175933})

A more expressive lookup

Let's clean up our code by factoring out the filtered list and predicate function. This has the additional benefit of parameterizing predicates to the function and the file list in which we want to search. Create a file called src/edgar/readdata.clj and add this namespace with the require definition:

(ns edgar.readdata
  (:require [clojure.java.io :as io]
            [clojure.edn :as edn]))

Start to author the file by adding the following code:

(defn lookupfn [flist pred-fn]

    (flatten
     (map (fn [x]
            (let [read-fn (comp edn/read-string slurp)
                  inp-edn (read-fn x)]
              (filter pred-fn
                      inp-edn)))
          flist)))

Now we can search for data using a range of criteria. With regard to the data we've written, we can look up tick data based on the following criteria:

  • A specific time
  • After a specific time
  • Before a specific time
  • A time range
  • A specific price
  • Above a specific price
  • Below a specific price
  • Price range

These predicates encode the criteria we've just listed:

(defn specific-time-pred [inst]   ;; -> functions returning functions
  #(= inst (:last-trade-time %)))

(defn time-after-pred [time]
  #(.after (:last-trade-time %) time))

(defn time-before-pred [time]
  #(.before (:last-trade-time %) time))

(defn time-range-pred [lower upper]
  #(and (.after (:last-trade-time %) lower)
        (.before (:last-trade-time %) upper)))

(defn specific-price-pred [price]
  #(= price (:last-trade-price %)))

(defn price-abouve-pred [price]
  #(> (:last-trade-price %) price))

(defn price-below-pred [price]
  #(< (:last-trade-price %) price))

(defn price-range-pred [lower upper]
  #(and (> (:last-trade-price %) lower)
        (< (:last-trade-price %) upper)))

We can also try out a few of these predicates on the lookup function we just created. Try these out on your system using the time instants that were written to your disk:

(def files (filter #(.isFile %) many-files))


(lookupfn files (specific-time-pred #inst "2015-08-15T17:18:51.352-00:00"))

(lookupfn files (time-range-pred #inst "2015-08-15T17:18:00.000-00:00" #inst "2015-08-15T17:19:00.000-00:00"))

(lookupfn files (specific-price-pred 4.028309189176084))

(lookupfn files (price-range-pred 6 10))

A simple query language

So far, we have a very rudimentary lookup function that takes one predicate at a time. It would be nice to have a way to pass many conditions that a lookup function would satisfy so that multiple expressions are possible, as follows:

(lookup :time-after #inst "2015-08-15T17:18:00.000-00:00")
(lookup :time-after #inst "2015-08-15T17:18:00.000-00:00"
        :time-before #inst "2015-08-15T17:19:00.000-00:00")

(lookup :price-abouve 12 :price-below 20)

(lookup :time-after #inst "2015-08-15T17:18:00.000-00:00"
        :time-before #inst "2015-08-15T17:19:00.000-00:00"
        :price-abouve 12
        :price-below 20)

Variable argument functions

To implement that syntax, the first thing our function should do is accept a variable list of arguments. Clojure functions let us do this by inserting an ampersand before a symbol that will contain our arguments. Try copying the following simple function into your REPL:

(defn foobarfn [& manyargs-ofanyname]
  (println manyargs-ofanyname)
  (println (type manyargs-ofanyname)))

(foobarfn 1 2 3 4 4 5 6 7)
;; (1 2 3 4 5 6 7)
;; clojure.lang.ArraySeq

(foobarfn :one :two :three)
;; (:one :two :three)
;; clojure.lang.ArraySeq

(foobarfn '("a" "s" "d" "f"))
;; (("a" "s" "d" "f"))
;; clojure.lang.ArraySeq

We can now call this function with any number of arguments. Passing seven integers to the function is just as valid as passing three keywords or one list.

The :pre and :post function conditions

The other thing that our function will want to do up front is ensure that parameters are submitted in pairs. Again, Clojure provides a facility to assert conditions with the :pre and :post assertions. These let you provide conditions that must be satisfied when entering or exiting your function. Simply put a map with the :pre and :post keys at the very beginning of your function in order to do this. The values should be vectors of conditions that you want to be true in both cases:

(defn onefn [one]
  {:pre [(not (nil? one)])
   :post [(pos? %)]}

  one)

(onefn nil)
;; java.lang.AssertionError: Assert failed: (not (nil? one))

(onefn -1)
;; java.lang.AssertionError: Assert failed: (pos? %)

(onefn 5)
;; 5

Try entering the preceding function in your REPL. Calling it with parameters that fall outside the :pre and :post constraints will throw an AssertionError. Keep in mind that you can put as many assertions as you like in either vector of the conditions:

(defn lookup [& constraints]
  {:pre [(even? (count constraints))]}

)

We can now use these two features to begin our function. Let's use variable arguments and ensure that these constraints are in pairs. Take a look at the following function and try to glean what it's doing:

(defn load-directory [fname]
  (filter #(.isFile %) (file-seq (io/file fname))))

(defn lookup [& constraints]

  ;; ensure constraints are in pairs -> Preconditions
  {:pre [(even? (count constraints))]}

  ;; map over pairs - find predicate fn based on keyword - partially apply fn with arg
  (let [files (if (some #{:source} constraints)

                (let [source-source (comp (partial filter #(= :source (first %1)))
                                          (partial partition 2))
                      source-value (comp second source-source)
                      source-key (comp first source-source)]

                  (if (string? source-key)
                    (load-directory (source-key constraints))
                    source-value))

                (load-directory "data/"))

        constraint-pairs (->> constraints
                              (partition 2)
                              (remove #(= :source (first %))))

        find-lookup-fn (fn [inp]
                         (case inp
                           :time specific-time-pred
                           :time-after time-after-pred
                           :time-before time-before-pred
                           :price specific-price-pred
                           :price-abouve price-abouve-pred
                           :price-below price-below-pred))

        constraint-pairs-A (map (fn [x]
                                  [(find-lookup-fn (first x)) (second x)])
                                constraint-pairs)

        lookupfn-fns-A (map (fn [x]
                                  (fn [y]
                                    (lookupfn y ((first x) (second x)))))
                            constraint-pairs-A)]

    ;; apply all fns with args
    (apply concat ((apply juxt lookupfn-fns-A)
                   files))))

After the input arguments and function precondition, we read the function from the let expression:

  • constraint-pairs: This breaks function parameters into pairs
  • find-lookup-fn: This creates a function to look up a corresponding predicate function for a given constraint key (for example, :time-after and :price-above)
  • constraint-pairs-A: This replaces a constraint key with the corresponding predicate function
  • lookupfn-fns-A: This returns another list of functions that call lookupfn with the file list, predicate, and submitted value that we've just looked up

The juxt higher order function

The juxt function is a Clojure core function that, like comp, returns another function that comprises its input functions. The juxt function, however, generates a function that is the juxtaposition of its input functions. It returns a vector of the result of applying each function to an input argument. You can supply as many input arguments as you like to a generated function. Just be sure that the source function(s) you supply will operate on the inputs you've provided:

(juxt inc dec)
;; #object[clojure.core$juxt$fn__4510 0x1d3be6d "clojure.core$juxt$fn__4510@1d3be6d"]

((juxt inc dec keyword) 10)
;; (11 9 nil)

((juxt inc dec (comp keyword name str)) 10)
;; [11 9 :10]

((juxt inc dec (comp keyword name str)) 10 20 30)
;; clojure.lang.ArityException: Wrong number of args (3) passed to: core/inc

((juxt + *) 10 20 30)
;; [60 6000]

(apply (juxt min max) '(10 20 30))
;; [10 30]

With that understanding, let's take a look at the double apply expression.

(apply concat ((apply juxt lookupfn-fns-A)
               files))))

This expression can be explained as follows:

  • The (apply juxt lookupfn-fns-A) expression will take our list of functions and generate a juxtaposition function
  • The generated juxtaposition function is then called with the input files (files)
  • What's returned from this call is the result list of each function in lookupfn-fns-A that's inside a list (a list of lists)
  • We can normally concat many lists together, for example, (concat [1 2 3] [4 5 6] [7 8 9]) ;; (1 2 3 4 5 6 7 8 9)
  • However, we reach for apply to use functions on arguments in lists, for example, (apply concat [[1 2] [3 4]]) ;; (1 2 3 4)
  • Thus, we (apply concat ...) to our list of lists

These are examples of how we can use the lookup function that's developing. Replace the stringified time values, such as (example 2015-08-15T17:18:51.352-00:00), with ones that reflect what's on your system:

  (def many-files (file-seq (io/file "data/")))

  (lookup :time #inst "2015-08-15T17:18:51.352-00:00")
  (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00")
  (lookup :time-before #inst "2015-08-15T17:19:00.000-00:00")
  (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00"   :time-before #inst "2015-08-15T17:19:00.000-00:00")

  (lookup :price 4.028309189176084)
  (lookup :price-abouve 12)
  (lookup :price-below 12)
  (lookup :price-abouve 12 :price-below 20)

  (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00"
          :time-before #inst "2015-08-15T17:19:00.000-00:00"
          :price-abouve 12
          :price-below 20)

  (lookup :source many-files
          :time #inst "2015-08-15T17:18:51.352-00:00")

  (lookup :source "data/"
          :time #inst "2015-08-15T17:18:51.352-00:00")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.212.102