Reading CSV data into Incanter datasets

One of the simplest data formats is comma-separated values (CSV), and you'll find that it's everywhere. Excel reads and writes CSV directly, as do most databases. Also, because it's really just plain text, it's easy to generate CSV files or to access them from any programming language.

Getting ready

First, let's make sure that we have the correct libraries loaded. Here's how the project Leiningen (https://github.com/technomancy/leiningen) project.clj file should look (although you might be able to use more up-to-date versions of the dependencies):

(defproject getting-data "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]])

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Also, in your REPL or your file, include these lines:

(use 'incanter.core
     'incanter.io)

Finally, downloaded a list of rest area locations from POI Factory at http://www.poi-factory.com/node/6643. The data is in a file named data/RestAreasCombined(Ver.BN).csv. The version designation might be different though, as the file is updated. You'll also need to register on the site in order to download the data. The file contains this data, which is the location and description of the rest stops along the highway:

-67.834062,46.141129,"REST AREA-FOLLOW SIGNS SB I-95 MM305","RR, PT, Pets, HF"
-67.845906,46.138084,"REST AREA-FOLLOW SIGNS NB I-95 MM305","RR, PT, Pets, HF"
-68.498471,45.659781,"TURNOUT NB I-95 MM249","Scenic Vista-NO FACILITIES"
-68.534061,45.598464,"REST AREA SB I-95 MM240","RR, PT, Pets, HF"

In the project directory, we have to create a subdirectory named data and place the file in this subdirectory.

I also created a copy of this file with a row listing the names of the columns and named it RestAreasCombined(Ver.BN)-headers.csv.

How to do it…

  1. Now, use the incanter.io/read-dataset function in your REPL:
    user=> (read-dataset "data/RestAreasCombined(Ver.BJ).csv")
    
    |      :col0 |     :col1 |                                :col2 |                      :col3 |
    |------------+-----------+--------------------------------------+----------------------------|
    | -67.834062 | 46.141129 | REST AREA-FOLLOW SIGNS SB I-95 MM305 |           RR, PT, Pets, HF |
    | -67.845906 | 46.138084 | REST AREA-FOLLOW SIGNS NB I-95 MM305 |           RR, PT, Pets, HF |
    | -68.498471 | 45.659781 |                TURNOUT NB I-95 MM249 | Scenic Vista-NO FACILITIES |
    | -68.534061 | 45.598464 |              REST AREA SB I-95 MM240 |           RR, PT, Pets, HF |
    | -68.539034 | 45.594001 |              REST AREA NB I-95 MM240 |           RR, PT, Pets, HF |
    …
  2. If we have a header row in the CSV file, then we include :header true in the call to read-dataset:
    user=> (read-dataset "data/RestAreasCombined(Ver.BJ)-headers.csv" :header true)
    
    | :longitude | :latitude |                                :name |                     :codes |
    |------------+-----------+--------------------------------------+----------------------------|
    | -67.834062 | 46.141129 | REST AREA-FOLLOW SIGNS SB I-95 MM305 |           RR, PT, Pets, HF |
    | -67.845906 | 46.138084 | REST AREA-FOLLOW SIGNS NB I-95 MM305 |           RR, PT, Pets, HF |
    | -68.498471 | 45.659781 |                TURNOUT NB I-95 MM249 | Scenic Vista-NO FACILITIES |
    | -68.534061 | 45.598464 |              REST AREA SB I-95 MM240 |           RR, PT, Pets, HF |
    | -68.539034 | 45.594001 |              REST AREA NB I-95 MM240 |           RR, PT, Pets, HF |
    …

How it works…

Together, Clojure and Incanter make a lot of common tasks easy, which is shown in the How to do it section of this recipe.

We've taken some external data, in this case from a CSV file, and loaded it into an Incanter dataset. In Incanter, a dataset is a table, similar to a sheet in a spreadsheet or a database table. Each column has one field of data, and each row has an observation of data. Some columns will contain string data (all of the columns in this example did), some will contain dates, and some will contain numeric data. Incanter tries to automatically detect when a column contains numeric data and coverts it to a Java int or double. Incanter takes away a lot of the effort involved with importing data.

There's more…

For more information about Incanter datasets, see Chapter 6, Working with Incanter Datasets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.204.5