Evaluating R files from Clojure

We might not always want to feed R code from Clojure directly into R. Many times, we might have files containing R expressions and we would want to evaluate the whole file.

We can do this quite easily too. Let's see how.

Getting ready

We must first complete the recipe, Setting up R to talk to Clojure, and have Rserve running. We must also have the Clojure-specific parts of that recipe done and the connection to Rserve made.

Moreover, we'll need access to the java.io.File class:

(import '[java.io File])

How to do it…

We'll first define a function to make evaluating a file in R easier, and then we'll find a file and execute it:

  1. The function to evaluate a file of R code takes a filename and (optionally) a connection to the R server. It feeds the file to R using R's source function, and it returns whatever R does:
    (defn r-source
      ([filename] (r-source filename *r-cxn*))
      ([filename r-cxn]
       (.eval r-cxn (str "source(""
                         (.getAbsolutePath (File. filename))
                         "")"))))
  2. For example, suppose we have a file named chrsqr-example.R that creates a random data table and performs a Χ² test on it:
    dat <- data.frame(q1=sample(c("A","B","C"),
        size=1000,replace=TRUE),
      sex=sample(c("M","F"),
         size=1000,replace=TRUE))
    dtab <- with(dat,table(q1,sex))
    
    (Xsq <- chisq.test(dtab))
  3. The results that come back from are a little complicated, but with some trial and error, we can tease the answers back out, as follows:
    user=> (def x-sqr (.asList (r-source "chisqr-example.R")))
    #'user/x-sqr
    ;; X-square
    user=> (.. x-sqr (at 0) asList (at "statistic") asDouble)
    0.2166086470268894
    ;; degrees of freedon
    user=> (.. x-sqr (at 0) asList (at "parameter") asInteger)
    2
    ;; p-value
    user=> (.. x-sqr (at 0) asList (at "p.value") asDouble)
    0.897354468808211

How it works…

The most difficult part of this is to deal with the return value. After calling r-source, we convert the output to an R list. We pull the statistic item from that and convert it to a double. That's the Χ² value. The parameter item is the degrees of freedom. Also, the p.value item is the p-value for the test.

Generally, when I'm picking out the results from their Java data structures, the REPL and documentation are the biggest help. For example, the value x-sqr, when printed on the REPL, displays this:

user=> x-sqr
[#<REXPGenericVector org.rosuda.REngine.REXPGenericVector@4e2f1185+[9]named> #<REXPLogical org.rosuda.REngine.REXPLogical@43be5d17[1]>]

This tells me that the list's first item is a generic R vector and the second item is an R logical structure. Diving further into the first item shows the names of the members it contains:

user=> (.. x-sqr (at 0) asList names)
["statistic" "parameter" "p.value" "method" "data.name" "observed" "expected" "residuals" "stdres"]

This helps me pick out the values I'm looking for, and by using some test data and referring to the documentation for the data types, I can easily write the code that is required to dig down to the results.

There's more…

The documentation for R's Java data types is available at http://rforge.net/org/docs/index.html?org/rosuda/REngine/package-tree.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.151.32