Passing vectors into R

In order to do very complex or meaningful analysis, we'll need to be able to pass vector or matrix data into R to operate on and analyze.

Let's see how to do this.

Getting ready

We must first complete the recipe, Setting up R to talk to Clojure, and have Rserve running. We must also have the Clojure-specific parts of that recipe done and the connection to Rserve made.

We'll also need access to the clojure.string namespace:

(require '[clojure.string :as str])

How to do it…

To make passing values into R easier, we'll first define a protocol and then we'll use it to pass a matrix to R:

  1. In order to handle the conversion of all the data types into a string that R can read, we'll define a protocol, ToR. Any data types that we want to marshal into R must implement this, as follows:
    (defprotocol ToR
      (->r [x] "Convert an item to R."))
  2. Now we'll implement this protocol for sequences, vectors, and numeric types:
    (extend-protocol ToR
      clojure.lang.ISeq
      (->r [coll] (str "c(" (str/join , (map ->r coll)) ")"))
      clojure.lang.PersistentVector
      (->r [coll] (->r (seq coll)))
      java.lang.Integer
      (->r [i] (str i))
      java.lang.Long
      (->r [l] (str l))
      java.lang.Float
      (->r [f] (str f))
      java.lang.Double
      (->r [d] (str d)))
  3. We create a wrapper function to call R's mean function:
    (defn r-mean
      ([coll] (r-mean coll *r-cxn*))
      ([coll r-cxn]
       (.. r-cxn
         (eval (str "mean(" (->r coll) ")"))
         asDouble)))
  4. With these in place, we can call them just as we would call any other function:
    user=> (r-mean [1.0 2.0 3.0])
    2.0
    user=> (r-mean (map (fn [_] (rand)) (range 5)))
    0.3966653617356786

How it works…

For most data types, marshaling to R simply means converting it to a string. However, for sequences and vectors, it's a little more complicated. Clojure has to convert all the sequence's items to R strings, join the items with a comma, and wrap it in a call to R's c constructor.

This is a perfect place to use protocols. Defining methods in order to marshal more data types to R is simple. For example, we can define a naïve method to work with strings as shown here:

(extend-protocol ToR
  java.lang.String
  (->r [s] (str ' s ')))

Of course, this method isn't without its problems. If a string has a quote within it, for instance, it must be escaped. Also, having to marshal data types back and forth in this manner can be computationally expensive, especially for large or complex data types.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.12.156