Chapter 9. Clustering, Classifying, and Working with Weka

In this chapter, we will cover the following recipes:

  • Loading CSV and ARFF files into Weka
  • Filtering, renaming, and deleting columns in Weka datasets
  • Discovering groups of data using K-Means clustering
  • Finding hierarchical clusters in Weka
  • Clustering with SOMs in Incanter
  • Classifying data with decision trees
  • Classifying data with the Naive Bayesian classifier
  • Classifying data with support vector machines
  • Finding associations in data with the Apriori algorithm

Introduction

Looking for patterns in our dataset is a large part of data analysis. Of course, a dataset of any complexity is too much for the human mind to see patterns in, so we rely on computers, statistics, and machine learning to augment our insights.

In this chapter, we'll take a look at a number of methods used to cluster and classify data. Depending on the nature of the data and the question(s) we're trying to answer, different algorithms will be more or less useful. For instance, while K-Means clustering is great for clustering numeric datasets, it's poorly suited for working with nominal data.

Most of the recipes in this chapter will use the Weka machine learning and data mining library (http://www.cs.waikato.ac.nz/ml/weka/). This is a full-featured library, which is used to analyze data using many different procedures and algorithms. It includes a more complete set of these algorithms than Incanter, which we've been using a lot so far. We'll start by seeing how to load CSV files into Weka and work with Weka datasets. However, for most of the chapter, we'll examine how to use this powerful library to perform different analyses. Weka's interface to the classes implementing these algorithms is very consistent. For the first recipe, in which we use one of these algorithms, Discovering groups of data using K-Means clustering, we'll define a macro that will facilitate creating wrapper functions for Weka algorithms. This is a great example shows using macros, and of how easy it is to create a wrapper over an external Java library to make it more natural to use from Clojure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.149.238