Loading Clojure data structures into datasets

While they are good for learning, Incanter's built-in datasets probably won't be that useful for your work (unless you work with irises). Other recipes cover ways to get data from CSV files and other sources into Incanter (see Chapter 1, Importing Data for Analysis). Incanter also accepts native Clojure data structures in a number of formats. We'll take look at a couple of these in this recipe.

Getting ready

We'll just need Incanter listed in our project.clj file:

(defproject inc-dsets "0.1.0"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]])

We'll also need to include this in our script or REPL:

(use 'incanter.core)

How to do it…

The primary function used to convert data into a dataset is to-dataset. While it can convert single, scalar values into a dataset, we'll start with slightly more complicated inputs.

  1. Generally, you'll be working with at least a matrix. If you pass this to to-dataset, what do you get?
    user=> (def matrix-set (to-dataset [[1 2 3] [4 5 6]]))
    #'user/matrix-set
    user=> (nrow matrix-set)
    2
    user=> (col-names matrix-set)
    [:col-0 :col-1 :col-2]
  2. All the data's here, but it can be labeled in a better way. Does to-dataset handle maps?
    user=> (def map-set (to-dataset {:a 1, :b 2, :c 3}))
    #'user/map-set
    user=> (nrow map-set)
    1
    user=> (col-names map-set)
    [:a :c :b]
  3. So, map keys become the column labels. That's much more intuitive. Let's throw a sequence of maps at it:
    user=> (def maps-set (to-dataset [{:a 1, :b 2, :c 3},
                                      {:a 4, :b 5, :c 6}]))
    #'user/maps-set
    user=> (nrow maps-set)
    2
    user=> (col-names maps-set)
    [:a :c :b]
  4. This is much more useful. We can also create a dataset by passing the column vector and the row matrix separately to dataset:
    user=> (def matrix-set-2
             (dataset [:a :b :c]
                              [[1 2 3] [4 5 6]]))
    #'user/matrix-set-2
    user=> (nrow matrix-set-2)
    2
    user=> (col-names matrix-set-2)
    [:c :b :a]

How it works…

The to-dataset function looks at the input and tries to process it intelligently. If given a sequence of maps, the column names are taken from the keys of the first map in the sequence.

Ultimately, it uses the dataset constructor to create the dataset. When you want the most control, you should also use the dataset. It requires the dataset to be passed in as a column vector and a row matrix. When the data is in this format or when we need the most control—to rename the columns, for instance—we can use dataset.

See also…

Several recipes in Chapter 1, Importing Data for Analysis, look at how to load data from different external sources into Incanter datasets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.108.175