Although datasets are often convenient, many times we'll want to treat our data as a matrix from linear algebra. In Incanter, matrices store a table of doubles. This provides good performance in a compact data structure. Moreover, we'll need matrices many times because some of Incanter's functions, such as trans
, only operate on a matrix. Plus, it implements Clojure's ISeq interface, so interacting with matrices is also convenient.
For this recipe, we'll need the Incanter libraries, so we'll use this project.clj
file:
(defproject inc-dsets "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"]])
We'll use the core
and io
namespaces, so we'll load these into our script or REPL:
(use '(incanter core io))
We'll use the Virginia census data that we've used periodically throughout the book. See the Managing program complexity with STM recipe from Chapter 3, Managing Complexity with Concurrent Programming, for information on how to get this dataset. You can also download it from http://www.ericrochester.com/clj-data-analysis/data/all_160_in_51.P35.csv.
This line binds the file name to the identifier data-file
:
(def data-file "data/all_160_in_51.P35.csv")
For this recipe, we'll create a dataset, convert it to a matrix, and then perform some operations on it:
(def va-data (read-dataset data-file :header true))
to-matrix
function. Before we do this, we'll pull out a few of the columns since matrixes can only contain floating-point numbers:(def va-matrix (to-matrix ($ [:POP100 :HU100 :P035001] va-data)))
first
in order to get the first row, take
in order to get a subset of the matrix, and count
in order to get the number of rows in the matrix:user=> (first va-matrix) A 1x3 matrix ------------- 8.19e+03 4.27e+03 2.06e+03 user=> (count va-matrix) 591
plus
function takes each row and sums each column separately:user=> (reduce plus va-matrix) A 1x3 matrix ------------- 5.43e+06 2.26e+06 1.33e+06
The to-matrix
function takes a dataset of floating-point values and returns a compact matrix. Matrices are used by many of Incanter's more sophisticated analysis functions, as they're easy to work with.
In this recipe, we saw the plus
matrix operator. Incanter defines a full suite of these. You can learn more about matrices and see what operators are available at https://github.com/liebke/incanter/wiki/matrices.
3.138.120.136