Preface

Data mining and machine learning are topics in artificial intelligence that focus on pattern discovery, prediction, and forecasting based on properties of collected data, while Weka is a toolbox implementing a variety of methods dedicated to those tasks. This book is about programming Weka in Java through practical examples.

Instant Weka How-To shows you exactly how to include Weka's machinery in your Java application. This book starts by importing and preparing the data, and then moves to more serious topics, such as classification, regression, clustering, and evaluation. For those of you who are eager to dive deeper, this book shows you how to implement online learning or to create your own classifier. This book includes several application examples, such as house price prediction, stock value forecasting, decision making for direct marketing, and a movie recommendation system.

Data mining is a hot topic in the industry, and Weka is an essential toolbox for Java. This book shows you how to stay ahead of the pack by implementing cutting-edge data mining aspects, such as regression and classification, and then moving to more advanced applications of forecasting, decision making, and recommendation.

What this book covers

Starting with Java and Weka (Simple) guides you through the process of preparing the environment to start with Java and Weka coding. It explains how to test your Weka installation and shows you how to prepare the Eclipse environment.

Loading the data (Simple) explains how to load a dataset in Weka's attribute-relation file format (ARFF), typically used to store training and testing data. In addition, it demonstrates how to create a dataset on-the-fly and save the data to a file.

Filtering attributes (Simple) demonstrates how to remove attributes after the dataset is loaded into Weka using supervised filters, which can take into account class attributes, and unsupervised filters, which disregard class attributes.

Selecting attributes (Intermediate) will explain how to find attributes that are relevant for the classification tasks, and how to select an evaluator, as well as a searching method that applies the selected evaluator on the attributes.

Training a classifier (Simple) addresses the most exciting task in data mining. It demonstrates how to train various classifiers, as well as how to build an incremental classifier, which does not need to be retrained from scratch after a new instance is added.

Building your own classifier (Advanced) covers the most essential steps required to design a functional classifier.

Tree visualization (Intermediate) demonstrates how to visualize a J48 decision tree, which can be extremely helpful to understand the underlying patterns in the tree.

Testing and evaluating your models (Simple) explains how to estimate classifier performance, that is, how accurate the model is when making its classifications. This recipe shows you how to assess the performance using a variety of measures and different evaluation techniques, such as separated train and test dataset, and k-fold cross-validation.

Regression models (Simple) explains how to use models that predict a value of numerical class, in contrast to classification, which predicts the value of a nominal class. Given a set of attributes, the regression builds a model, usually an equation that is used to compute the predicted class value.

Association rules (Intermediate) explains how to find frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and so on. It's often used to do market basket analysis, as done by big supermarket chains to decide what products are often bought together; these are then placed close to each other in the store to increase the chance of people picking them up on impulse.

Clustering (Simple) demonstrates how to use clustering, that is, how to automatically group examples in the dataset by a similarity measure.

Reusing models (Intermediate) shows how to build a model, to save it into a bytestream as a file for later use, and to restore it into the original object.

Data mining in direct marketing (Simple) explains how to implement decision making to guide direct marketing in a company. It uses data from a real-world business problem that contains information on customers of an insurance company. The goal is to predict which of the customers in the train set will buy a caravan insurance policy.

Using Weka for stock value forecasting (Advanced) demonstrates how to forecast the next day's closing price using daily high, low, opening, and closing data for Apple Computer stocks.

Recommendation system (Advanced) demonstrates how to implement a recommendation system to suggest "customers who bought this item also bought…" It shows an approach based on collaborative filtering to recommend movies to a user.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.179.220