Working with changes in values

Sometimes, we are more interested in how values change over time, or across some other progression, than we are in the values themselves. This information is latent in the data, but making it explicit makes it easier to work with and visualize.

Getting ready

First, we'll use these dependencies in our project.clj:

(defproject statim "0.1.0"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]])

We also need to require Incanter in our script or REPL:

(require '[incanter.core :as i]
         'incanter.io)

Finally, we'll use the Virginia census data. You can download the file from http://www.ericrochester.com/clj-data-analysis/data/all_160_in_51.P3.csv:

(def data-file "data/all_160_in_51.P3.csv")

How to do it…

For this recipe, we'll take some census data and add a column to show the change in population between the 2000 and 2010 censuses:

  1. To begin, we'll need to read in the data:
    (def data
      (incanter.io/read-dataset data-file 
      :header true))
  2. If we look at the values in the field for the 2000 census population, some of them are empty. This will cause errors, so we'll replace those with zeros. Here's the function to do that:
    (defn check-int [x] (if (integer? x) x 0)) x 0))
  3. Now we can get the difference in population between the two censuses:
    (def growth-rates
      (->> data
        (i/$map check-int :POP100.2000)
        (i/minus (i/sel data :cols :POP100))
        (i/dataset [:POP.DELTA])
        (i/conj-cols data)))
  4. As we might expect, some places have grown and some have shrunk:
    user=> (i/sel growth-rates
           :cols [:NAME :POP100 :POP100.2000 :POP.DELTA]
           :rows (range 5))
    |           :NAME | :POP100 | :POP100.2000 | :POP.DELTA |
    |-----------------+---------+--------------+------------|
    |   Abingdon town |    8191 |         7780 |      411.0 |
    |    Accomac town |     519 |          547 |      -28.0 |
    |    Alberta town |     298 |          306 |       -8.0 |
    | Alexandria city |  139966 |       128283 |    11683.0 |
    |   Allisonia CDP |     117 |              |      117.0 |

How it works…

This was a pretty straightforward process, but let's look at it line-by-line to make sure everything's clear. We'll follow the steps of the ->> macro.

  1. We'll map the values in the 2000 census population column over the replace-empty function we defined earlier to get rid of empty values:
      (->> data
        (i/$map check-int :POP100.2000)
  2. We'll select the 2010 census population and subtract the 2000 values from it:
        (i/minus (i/sel data :cols :POP100))
  3. We'll take the differences and create a new dataset with one column named:POP.DELTA:
        (i/dataset [:POP.DELTA])
  4. Lastly, we'll merge it back into the original dataset with incanter.core/conj-cols. This function takes two datasets with the same number of rows, and it returns a new dataset with the columns from both of the input datasets:
        (i/conj-cols data))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.59.198