While they are good for learning, Incanter's built-in datasets probably won't be that useful for your work (unless you work with irises). Other recipes cover ways to get data from CSV files and other sources into Incanter (see Chapter 1, Importing Data for Analysis). Incanter also accepts native Clojure data structures in a number of formats. We'll take look at a couple of these in this recipe.
We'll just need Incanter listed in our project.clj
file:
(defproject inc-dsets "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"]])
We'll also need to include this in our script or REPL:
(use 'incanter.core)
The primary function used to convert data into a dataset is to-dataset
. While it can convert single, scalar values into a dataset, we'll start with slightly more complicated inputs.
to-dataset
, what do you get?user=> (def matrix-set (to-dataset [[1 2 3] [4 5 6]])) #'user/matrix-set user=> (nrow matrix-set) 2 user=> (col-names matrix-set) [:col-0 :col-1 :col-2]
to-dataset
handle maps?user=> (def map-set (to-dataset {:a 1, :b 2, :c 3})) #'user/map-set user=> (nrow map-set) 1 user=> (col-names map-set) [:a :c :b]
user=> (def maps-set (to-dataset [{:a 1, :b 2, :c 3}, {:a 4, :b 5, :c 6}])) #'user/maps-set user=> (nrow maps-set) 2 user=> (col-names maps-set) [:a :c :b]
dataset
:user=> (def matrix-set-2 (dataset [:a :b :c] [[1 2 3] [4 5 6]])) #'user/matrix-set-2 user=> (nrow matrix-set-2) 2 user=> (col-names matrix-set-2) [:c :b :a]
The to-dataset
function looks at the input and tries to process it intelligently. If given a sequence of maps, the column names are taken from the keys of the first map in the sequence.
Ultimately, it uses the dataset
constructor to create the dataset. When you want the most control, you should also use the dataset. It requires the dataset to be passed in as a column vector and a row matrix. When the data is in this format or when we need the most control—to rename the columns, for instance—we can use dataset
.
Several recipes in Chapter 1, Importing Data for Analysis, look at how to load data from different external sources into Incanter datasets.
3.12.108.175