Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Grouping data with $group-by

Datasets often come with an inherent structure. Two or more rows might have the same value in one column, and we might want to leverage that by grouping those rows together in our analysis.

Getting ready

First, we'll need to declare a dependency on Incanter in the project.clj file:

(defproject inc-dsets "0.1.0"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]
                 [org.clojure/data.csv "0.1.2"]])

Next, we'll include Incanter core and io in our script or REPL:

(require '[incanter.core :as i]
         '[incanter.io :as i-io])

For data, we'll use the census race data for all the states. You can download it from http://www.ericrochester.com/clj-data-analysis/data/all_160.P3.csv.

These lines will load the data into the race-data name:

(def data-file "data/all_160.P3.csv")
(def race-data (i-io/read-dataset data-file :header true))

How to do it…

Incanter lets you group rows for further analysis or to summarize them with the $group-by function. All you need to do is pass the data to $group-by with the column or function to group on:

(def by-state (i/$group-by :STATE race-data))

How it works…

This function returns a map where each key is a map of the fields and values represented by that grouping. For example, this is how the keys look:

user=> (take 5 (keys by-state))
({:STATE 29} {:STATE 28} {:STATE 31} {:STATE 30} {:STATE 25})

We can get the data for Virginia back out by querying the group map for state 51.

user=> (i/$ (range 3) [:GEOID :STATE :NAME :POP100]
            (by-state {:STATE 51}))

|  :GEOID | :STATE |         :NAME | :POP100 |
|---------+--------+---------------+---------|
| 5100148 |     51 | Abingdon town |    8191 |
| 5100180 |     51 |  Accomac town |     519 |
| 5100724 |     51 |  Alberta town |     298 |

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Grouping data with $group-by

Create new playlist

Sign In

Sign Up

Grouping data with $group-by

Getting ready

How to do it…

How it works…

Table of Contents for
Grouping data with $group-by