Chapter 1. Importing Data for Analysis

In this chapter, we will cover the following recipes:

  • Creating a new project
  • Reading CSV data into Incanter datasets
  • Reading JSON data into Incanter datasets
  • Reading data from Excel with Incanter
  • Reading data from JDBC databases
  • Reading XML data into Incanter datasets
  • Scraping data from tables in web pages
  • Scraping textual data from web pages
  • Reading RDF data
  • Querying RDF data with SPARQL
  • Aggregating data from different formats

Introduction

There's not much data analysis that can be done without data, so the first step in any project is to evaluate the data we have and the data that we need. Once we have some idea of what we'll need, we have to figure out how to get it.

Many of the recipes in this chapter and in this book use Incanter (http://incanter.org/) to import the data and target Incanter datasets. Incanter is a library that is used for statistical analysis and graphics in Clojure (similar to R) an open source language for statistical computing (http://www.r-project.org/). Incanter might not be suitable for every task (for example, we'll use the Weka library for machine learning later) but it is still an important part of our toolkit for doing data analysis in Clojure. This chapter has a collection of recipes that can be used to gather data and make it accessible to Clojure.

For the very first recipe, we'll take a look at how to start a new project. We'll start with very simple formats such as comma-separated values (CSV) and move into reading data from relational databases using JDBC. We'll examine more complicated data sources, such as web scraping and linked data (RDF).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.68.28