Converting datasets to matrices

Although datasets are often convenient, many times we'll want to treat our data as a matrix from linear algebra. In Incanter, matrices store a table of doubles. This provides good performance in a compact data structure. Moreover, we'll need matrices many times because some of Incanter's functions, such as trans, only operate on a matrix. Plus, it implements Clojure's ISeq interface, so interacting with matrices is also convenient.

Getting ready

For this recipe, we'll need the Incanter libraries, so we'll use this project.clj file:

(defproject inc-dsets "0.1.0"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]])

We'll use the core and io namespaces, so we'll load these into our script or REPL:

(use '(incanter core io))

We'll use the Virginia census data that we've used periodically throughout the book. See the Managing program complexity with STM recipe from Chapter 3, Managing Complexity with Concurrent Programming, for information on how to get this dataset. You can also download it from http://www.ericrochester.com/clj-data-analysis/data/all_160_in_51.P35.csv.

This line binds the file name to the identifier data-file:

(def data-file "data/all_160_in_51.P35.csv")

How to do it…

For this recipe, we'll create a dataset, convert it to a matrix, and then perform some operations on it:

  1. First, we need to read the data into a dataset, as follows:
    (def va-data (read-dataset data-file :header true))
  2. Then, in order to convert it to a matrix, we just pass it to the to-matrix function. Before we do this, we'll pull out a few of the columns since matrixes can only contain floating-point numbers:
    (def va-matrix
        (to-matrix ($ [:POP100 :HU100 :P035001] va-data)))
  3. Now that it's a matrix, we can treat it like a sequence of rows. Here, we pass it to first in order to get the first row, take in order to get a subset of the matrix, and count in order to get the number of rows in the matrix:
    user=> (first va-matrix)
     A 1x3 matrix
     -------------
     8.19e+03  4.27e+03  2.06e+03
    
    user=> (count va-matrix)
    591
  4. We can also use Incanter's matrix operators to get the sum of each column, for instance. The plus function takes each row and sums each column separately:
    user=> (reduce plus va-matrix)
     A 1x3 matrix
     -------------
     5.43e+06  2.26e+06  1.33e+06

How it works…

The to-matrix function takes a dataset of floating-point values and returns a compact matrix. Matrices are used by many of Incanter's more sophisticated analysis functions, as they're easy to work with.

There's more…

In this recipe, we saw the plus matrix operator. Incanter defines a full suite of these. You can learn more about matrices and see what operators are available at https://github.com/liebke/incanter/wiki/matrices.

See also…

  • The Selecting columns with $ recipe in this chapter has more information on how to select specific columns from a dataset
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.120.136