If you've followed the progress from the last chapter, per data chunk, you would have written out four kinds of files that represent all the data we've generated:
<file-name>-tick.edn <file-name>-sma.edn <file-name>-ema.edn <file-name>-bol.edn
Since we've named the files appropriately in our list of many-files
, we can use the filter
function to isolate on any condition—in this case, we'll isolate only one type of file.
For our predicate function, we can use a regular expression to search for a string pattern in a filename. Clojure represents regular expressions with a string prefixed by a hash such as #""
. So, the re-matches
function, for example, takes a regular expression and string and returns the first match it finds, if any:
(re-matches #"x" "foobar") ;; nil (re-matches #"foo.*" "foobar") ;; "foobar"
The returned value (or the absence of one, which is nil
) is considered truthy, which just means that we can use it in an if
condition or a predicate. So, our filter function will look something like this:
;; filter on the type of file (filter #(re-matches #".*-ticks.edn" (.getName %)) many-files)
Here, we filter over our many-files result list. The predicate is an anonymous function that checks a condition by calling re-matches
on each element in the list using the #".*-ticks.edn"
name pattern. Each element happens to be a Java java.io.File
class with the getName
method. Therefore, any filename that matches our name pattern will satisfy the filter.
Our re-matches
expression is just one kind of a predicate that filters on a type of file. We can also filter the content in files. This time, however, let's pick a particular time instance that you've saved. My filesystem has #inst 2015-08-15T17:18:51.352-00:00
. With regard to the following code, we can understand it by first looking at the outer map
function, which itself takes a mapping function, and a list of files. This input list is filtered due to the constraint of being a file—no directories are allowed. Therefore, it makes sense to look at the inner filter
function. It takes a (pred-fn
) predicate function and inputs EDN data, for example, (input-edn
). The predicate function qualifies that the :last-trade-time
value in each map is exactly the time instance of #inst "2015-08-15T17:18:51.352-00:00"
. However, input-edn
is a little more interesting. Each file that's passed to the mapping function calls our created read-fn
. The read-fn
function is another example of how the comp
function is used. Recall that comp
simply creates a function by composing other functions together. So, in this case, read-fn
is a composition of edn/read-string
and slurp
. When used, the input to read-fn
will first go to slurp
, the results of which are then chained to edn/read-string
:
(map (fn [each-file] (let [read-fn (comp edn/read-string slurp) ;; -> use of comp function pred-fn #(= #inst "2015-08-15T17:18:51.352-00:00" (:last-trade-time %)) inp-edn (read-fn each-file)] (filter pred-fn inp-edn))) (filter #(.isFile %) many-files)) '(({:last-trade-time #inst "2015-08-15T17:18:51.352-00:00", :last-trade-price 101.90402553222992} {:last-trade-time #inst "2015-08-15T17:18:51.352-00:00", :last-trade-price 101.63143059175933}) () () () () () () () () () () () () () () () () () () ())
Running this expression on all my files shows that only two maps have this instant. This is to be expected since the time constraint is so specific. However, map
will return an empty list for each element where (filter pred-fn inp-edn
) doesn't match and return anything. So, the output format is not easy to look at.
We can clean up the preceding nested expression by simply flattening the result list. This can again easily be done with the flatten
function. It takes a nested combination of sequential data structures (such as lists, vectors, and so on) and gives back content in a single flat sequence, removing empty sequences and maintaining the shape of associative structures:
(flatten '({:a {:b {:c 1}}} {:g :h} ({:e :R}))) ;; ({:a {:b {:c 1}}} {:g :h} {:e :R}) (flatten '({:a {:b {:c 1}}} ({:g :h} ({:e :R})))) ;; ({:a {:b {:c 1}}} {:g :h} {:e :R})
The following code is the same as the expression we encountered earlier, but with its results flattened and wrapped in a function definition. In your REPL, define this function, replacing the string to the right-hand side of #inst
along with a date format that was written to your filesystem. After creating the function, simply evaluate it by calling (lookupfn)
:
(defn lookupfn [] (flatten (map (fn [x] (let [read-fn (comp edn/read-string slurp) pred-fn #(= #inst "2015-08-15T17:18:51.352-00:00" (:last-trade-time %)) inp-edn (read-fn x)] (filter pred-fn inp-edn))) (filter #(.isFile %) many-files)))) (lookupfn) ;; ({:last-trade-time #inst "2015-08-15T17:18:51.352-00:00", :last-trade-price 101.90402553222992} {:last-trade-time #inst "2015-08-15T17:18:51.352-00:00", :last-trade-price 101.63143059175933})
Let's clean up our code by factoring out the filtered list and predicate function. This has the additional benefit of parameterizing predicates to the function and the file list in which we want to search. Create a file called src/edgar/readdata.clj
and add this namespace with the require
definition:
(ns edgar.readdata (:require [clojure.java.io :as io] [clojure.edn :as edn]))
Start to author the file by adding the following code:
(defn lookupfn [flist pred-fn] (flatten (map (fn [x] (let [read-fn (comp edn/read-string slurp) inp-edn (read-fn x)] (filter pred-fn inp-edn))) flist)))
Now we can search for data using a range of criteria. With regard to the data we've written, we can look up tick data based on the following criteria:
These predicates encode the criteria we've just listed:
(defn specific-time-pred [inst] ;; -> functions returning functions #(= inst (:last-trade-time %))) (defn time-after-pred [time] #(.after (:last-trade-time %) time)) (defn time-before-pred [time] #(.before (:last-trade-time %) time)) (defn time-range-pred [lower upper] #(and (.after (:last-trade-time %) lower) (.before (:last-trade-time %) upper))) (defn specific-price-pred [price] #(= price (:last-trade-price %))) (defn price-abouve-pred [price] #(> (:last-trade-price %) price)) (defn price-below-pred [price] #(< (:last-trade-price %) price)) (defn price-range-pred [lower upper] #(and (> (:last-trade-price %) lower) (< (:last-trade-price %) upper)))
We can also try out a few of these predicates on the lookup function we just created. Try these out on your system using the time instants that were written to your disk:
(def files (filter #(.isFile %) many-files)) (lookupfn files (specific-time-pred #inst "2015-08-15T17:18:51.352-00:00")) (lookupfn files (time-range-pred #inst "2015-08-15T17:18:00.000-00:00" #inst "2015-08-15T17:19:00.000-00:00")) (lookupfn files (specific-price-pred 4.028309189176084)) (lookupfn files (price-range-pred 6 10))
So far, we have a very rudimentary lookup function that takes one predicate at a time. It would be nice to have a way to pass many conditions that a lookup function would satisfy so that multiple expressions are possible, as follows:
(lookup :time-after #inst "2015-08-15T17:18:00.000-00:00") (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00" :time-before #inst "2015-08-15T17:19:00.000-00:00") (lookup :price-abouve 12 :price-below 20) (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00" :time-before #inst "2015-08-15T17:19:00.000-00:00" :price-abouve 12 :price-below 20)
To implement that syntax, the first thing our function should do is accept a variable list of arguments. Clojure functions let us do this by inserting an ampersand before a symbol that will contain our arguments. Try copying the following simple function into your REPL:
(defn foobarfn [& manyargs-ofanyname] (println manyargs-ofanyname) (println (type manyargs-ofanyname))) (foobarfn 1 2 3 4 4 5 6 7) ;; (1 2 3 4 5 6 7) ;; clojure.lang.ArraySeq (foobarfn :one :two :three) ;; (:one :two :three) ;; clojure.lang.ArraySeq (foobarfn '("a" "s" "d" "f")) ;; (("a" "s" "d" "f")) ;; clojure.lang.ArraySeq
We can now call this function with any number of arguments. Passing seven integers to the function is just as valid as passing three keywords or one list.
The other thing that our function will want to do up front is ensure that parameters are submitted in pairs. Again, Clojure provides a facility to assert conditions with the :pre
and :post
assertions. These let you provide conditions that must be satisfied when entering or exiting your function. Simply put a map with the :pre
and :post
keys at the very beginning of your function in order to do this. The values should be vectors of conditions that you want to be true in both cases:
(defn onefn [one] {:pre [(not (nil? one)]) :post [(pos? %)]} one) (onefn nil) ;; java.lang.AssertionError: Assert failed: (not (nil? one)) (onefn -1) ;; java.lang.AssertionError: Assert failed: (pos? %) (onefn 5) ;; 5
Try entering the preceding function in your REPL. Calling it with parameters that fall outside the :pre
and :post
constraints will throw an AssertionError
. Keep in mind that you can put as many assertions as you like in either vector of the conditions:
(defn lookup [& constraints] {:pre [(even? (count constraints))]} )
We can now use these two features to begin our function. Let's use variable arguments and ensure that these constraints are in pairs. Take a look at the following function and try to glean what it's doing:
(defn load-directory [fname] (filter #(.isFile %) (file-seq (io/file fname)))) (defn lookup [& constraints] ;; ensure constraints are in pairs -> Preconditions {:pre [(even? (count constraints))]} ;; map over pairs - find predicate fn based on keyword - partially apply fn with arg (let [files (if (some #{:source} constraints) (let [source-source (comp (partial filter #(= :source (first %1))) (partial partition 2)) source-value (comp second source-source) source-key (comp first source-source)] (if (string? source-key) (load-directory (source-key constraints)) source-value)) (load-directory "data/")) constraint-pairs (->> constraints (partition 2) (remove #(= :source (first %)))) find-lookup-fn (fn [inp] (case inp :time specific-time-pred :time-after time-after-pred :time-before time-before-pred :price specific-price-pred :price-abouve price-abouve-pred :price-below price-below-pred)) constraint-pairs-A (map (fn [x] [(find-lookup-fn (first x)) (second x)]) constraint-pairs) lookupfn-fns-A (map (fn [x] (fn [y] (lookupfn y ((first x) (second x))))) constraint-pairs-A)] ;; apply all fns with args (apply concat ((apply juxt lookupfn-fns-A) files))))
After the input arguments and function precondition, we read the function from the let
expression:
constraint-pairs
: This breaks function parameters into pairsfind-lookup-fn
: This creates a function to look up a corresponding predicate function for a given constraint key (for example, :time-after
and :price-above
)constraint-pairs-A
: This replaces a constraint key with the corresponding predicate functionlookupfn-fns-A
: This returns another list of functions that call lookupfn
with the file list, predicate, and submitted value that we've just looked upThe juxt
function is a Clojure core function that, like comp
, returns another function that comprises its input functions. The juxt
function, however, generates a function that is the juxtaposition of its input functions. It returns a vector of the result of applying each function to an input argument. You can supply as many input arguments as you like to a generated function. Just be sure that the source function(s) you supply will operate on the inputs you've provided:
(juxt inc dec) ;; #object[clojure.core$juxt$fn__4510 0x1d3be6d "clojure.core$juxt$fn__4510@1d3be6d"] ((juxt inc dec keyword) 10) ;; (11 9 nil) ((juxt inc dec (comp keyword name str)) 10) ;; [11 9 :10] ((juxt inc dec (comp keyword name str)) 10 20 30) ;; clojure.lang.ArityException: Wrong number of args (3) passed to: core/inc ((juxt + *) 10 20 30) ;; [60 6000] (apply (juxt min max) '(10 20 30)) ;; [10 30]
With that understanding, let's take a look at the double apply
expression.
(apply concat ((apply juxt lookupfn-fns-A) files))))
This expression can be explained as follows:
(apply juxt lookupfn-fns-A)
expression will take our list of functions and generate a juxtaposition functionjuxtaposition
function is then called with the input files (files
)lookupfn-fns-A
that's inside a list (a list of lists)concat
many lists together, for example, (concat [1 2 3] [4 5 6] [7 8 9]) ;; (1 2 3 4 5 6 7 8 9)
apply
to use functions on arguments in lists, for example, (apply concat [[1 2] [3 4]]) ;; (1 2 3 4)
(apply concat ...)
to our list of listsThese are examples of how we can use the lookup function that's developing. Replace the stringified time values, such as (example 2015-08-15T17:18:51.352-00:00
), with ones that reflect what's on your system:
(def many-files (file-seq (io/file "data/"))) (lookup :time #inst "2015-08-15T17:18:51.352-00:00") (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00") (lookup :time-before #inst "2015-08-15T17:19:00.000-00:00") (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00" :time-before #inst "2015-08-15T17:19:00.000-00:00") (lookup :price 4.028309189176084) (lookup :price-abouve 12) (lookup :price-below 12) (lookup :price-abouve 12 :price-below 20) (lookup :time-after #inst "2015-08-15T17:18:00.000-00:00" :time-before #inst "2015-08-15T17:19:00.000-00:00" :price-abouve 12 :price-below 20) (lookup :source many-files :time #inst "2015-08-15T17:18:51.352-00:00") (lookup :source "data/" :time #inst "2015-08-15T17:18:51.352-00:00")
18.218.212.102