Preface

Welcome to the second edition of Clojure Data Analysis Cookbook! It seems that books become obsolete almost as quickly as software does, so here we have the opportunity to keep things up-to-date and useful.

Moreover, the state of the art of data analysis is also still evolving and changing. The techniques and technologies are being refined and improved. Hopefully, this book will capture some of that. I've also added a new chapter on how to work with unstructured textual data.

In spite of these changes, some things have stayed the same. Clojure has further proven itself to be an excellent environment to work with data. As a member of the lisp family of languages, it inherits a flexibility and power that is hard to match. The concurrency and parallelization features have further proven themselves as great tools for developing software and analyzing data.

Clojure's usefulness for data analysis is further improved by a number of strong libraries. Incanter provides a practical environment to work with data and perform statistical analysis. Cascalog is an easy-to-use wrapper over Hadoop and Cascading. Finally, when you're ready to publish your results, ClojureScript, an implementation of Clojure that generates JavaScript, can help you to visualize your data in an effective and persuasive way.

Moreover, Clojure runs on the Java Virtual Machine (JVM), so any libraries written for Java are available too. This gives Clojure an incredible amount of breadth and power.

I hope that this book will give you the tools and techniques you need to get answers from your data.

What this book covers

Chapter 1, Importing Data for Analysis, covers how to read data from a variety of sources, including CSV files, web pages, and linked semantic web data.

Chapter 2, Cleaning and Validating Data, presents strategies and implementations to normalize dates, fix spelling, and work with large datasets. Getting data into a useable shape is an important, but often overlooked, stage of data analysis.

Chapter 3, Managing Complexity with Concurrent Programming, covers Clojure's concurrency features and how you can use them to simplify your programs.

Chapter 4, Improving Performance with Parallel Programming, covers how to use Clojure's parallel processing capabilities to speed up the processing of data.

Chapter 5, Distributed Data Processing with Cascalog, covers how to use Cascalog as a wrapper over Hadoop and the Cascading library to process large amounts of data distributed over multiple computers.

Chapter 6, Working with Incanter Datasets, covers the basics of working with Incanter datasets. Datasets are the core data structures used by Incanter, and understanding them is necessary in order to use Incanter effectively.

Chapter 7, Statistical Data Analysis with Incanter, covers a variety of statistical processes and tests used in data analysis. Some of these are quite simple, such as generating summary statistics. Others are more complex, such as performing linear regressions and auditing data with Benford's Law.

Chapter 8, Working with Mathematica and R, talks about how to set up Clojure in order to talk to Mathematica or R. These are powerful data analysis systems, and we might want to use them sometimes. This chapter will show you how to get these systems to work together, as well as some tasks that you can perform once they are communicating.

Chapter 9, Clustering, Classifying, and Working with Weka, covers more advanced machine learning techniques. In this chapter, we'll primarily use the Weka machine learning library. Some recipes will discuss how to use it and the data structures its built on, while other recipes will demonstrate machine learning algorithms.

Chapter 10, Working with Unstructured and Textual Data, looks at tools and techniques used to extract information from the reams of unstructured, textual data.

Chapter 11, Graphing in Incanter, shows you how to generate graphs and other visualizations in Incanter. These can be important for exploring and learning about your data and also for publishing and presenting your results.

Chapter 12, Creating Charts for the Web, shows you how to set up a simple web application in order to present findings from data analysis. It will include a number of recipes that leverage the powerful D3 visualization library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.255.250