What this book covers

Chapter 1, Setting GNU R for Predictive Analytics, deals with the setting of R, how to load and install packages, and other basic operations. Only beginners should read this. If you are not a beginner, you will be bored! (Beginners should find the chapter entertaining).

Chapter 2, Visualizing and Manipulating Data Using R, deals with basic visualization functions in R and data manipulation. This chapter also aims to bring beginners up to speed for the rest of the book.

Chapter 3, Data Visualization with Lattice, deals with more advanced visualization functions. The concept of multipanel conditioning plots is presented. These allow you to examine the relationship between attributes as a function of group membership (for example, women versus men). A good working knowledge of R programming is necessary from this point.

Chapter 4, Cluster Analysis, presents the concept of clustering and the different types of clustering algorithms. It shows how to program and use a basic clustering algorithm (k-means) in R. Special attention is given to the description of distance measures and how to select the number of clusters for the analyses.

Chapter 5, Agglomerative Clustering Using hclust(), deals with hierarchical clustering. It shows how to use agglomerative clustering in R and the options to configure the analysis.

Chapter 6, Dimensionality Reduction with Principal Component Analysis, discusses the uses of PCA, notably dimension reduction. How to build a simple PCA algorithm, how to use PCA, and example applications are explored in the chapter.

Chapter 7, Exploring Association Rules with Apriori, focuses on the functioning of the apriori algorithm, how to perform the analyses, and how to interpret the outputs. Among other applications, association rules can be used to discover which products are frequently bought together (marked basket analysis).

Chapter 8, Probability Distributions, Covariance, and Correlation, discusses basic statistics and how they can be useful for prediction. The concepts given in the title are discussed without too much technicality, but formulas are proposed for the mathematically inclined.

Chapter 9, Linear Regression, builds upon the knowledge acquired in the previous chapter to show how to build a regression algorithm, including how to compute the coefficients and p values. The assumptions of linear regression (ordinary least squares) are rapidly discussed. The chapter then focuses on the use (and misuse) of regression.

Chapter 10, Classification with k-Nearest Neighbors and Naïve Bayes, deals with the classification problems of using two of the most popular algorithms. We build our own k-NN algorithm, with which we analyze the famous iris dataset. We also demonstrate how Naïve Bayes works. The chapter also deals with the use of both algorithms.

Chapter 11, Classification Trees, explores classification using no less than five classification tree algorithms: C4.5, C5, CART (classification part), random forests, and conditional inference trees. Entropy, information gain, pruning, bagging, and other important concepts are discussed.

Chapter 12, Multilevel Analyses, deals with the use of nested data. We will briefly discuss the functioning of multilevel regression (with mixed models), and will then focus on the important aspects in the analysis, notably, how to create and compare the models, and how to understand the outputs.

Chapter 13, Text Analytics with R, focuses on the use of some algorithms that we discussed in other chapters, as well as new ones, with the aim of analyzing text. We will start by showing you how to perform text preprocessing, we will explain important concepts, and then jump right into the analysis. We will highlight the importance of testing different algorithms on the same corpus.

Chapter 14, Cross-validation and Bootstrapping Using Caret and Exporting Predictive Models Using PMML, deals with two important aspects, the first is ascertaining the validity of the models and the second is exporting the models for production. Training and testing datasets are used in most chapters. These are minimal requirements, and cross-validation as well as bootstrapping are significant improvements.

Appendix A, Exercises and Solutions, provides the exercises and the solutions for the chapters in the book.

Appendix B, Further Reading and References, it provides the references for the chapters in the book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.51.228