About This Book
Purpose

As we are completing this book, it is early 2013 and the United States Presidential election is over, Barack Obama won and now everyone knows. We have heard about the big buzz around topics like business intelligence, business analytics, predictive analytics/data mining/data science, and big data. We heard reports like the McKinsey Global Institute’s 2011 Big Data1 report that stated by 2018 the United States alone will face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data. We are now finding out that a major reason President Obama won was because his staff analyzed data on their customers, the electorate, which gave him a competitive advantage and helped him win the election. So, now you know—using and leveraging your data cannot only provide you with a competitive advantage in your domain; it can also help win you an election.

The aim of this book is to expose you the reader, who is an educated consumer of statistics, to a set of multivariate and other modeling techniques, so that you not only become skilled at applying these techniques, but also skilled at developing the crucial skill of telling the statistical story behind the data. (The techniques include multiple regression, ANOVA, logistic regression, principal component analysis, cluster analysis, decision trees and neural networks.) This is what the 1.5 million managers who are mentioned in the McKinsey Global Institute’s Big Data report will need to be able to do.

A major portion of academic programs in business analytics, statistics, and data mining (whether in the Arts and Sciences or Business Schools) begin their curriculum with a basic introduction to statistics. That course is usually followed by a data mining/predictive analytics2 course. The basic statistics course teaches the student how to summarize data and perform basic statistical tests using data sets with perhaps one or two variables. In the basic statistics course, a statistical study is viewed as linear. That is, the student applies a technique in response to a problem statement; then the student gives the answer and is done. The subsequent data mining/predictive analytics class requires students to analyze data sets with several hundred variables.

Multivariate/real-world statistical studies are not linear; they are iterative. You try a technique, review and reflect on the results, and determine the next step. And you develop the statistical story behind the data. With such a large conceptual jump, most students are lost; they lack a fundamental understanding of statistical analysis. For example, most introductory statistics books do not address logistic regression. But a predictive analytics or data mining course usually assumes prior knowledge of logistic regression. In this textbook, we provide the bridge to take the reader from univariate/bivariate statistics to real-world multivariate statistical analysis.

Further, nearly all statistics books fail to address the critical skill of developing the statistical story. At the end of each chapter, after developing some statistical technique, the student will know what the problems are about and which technique to use—the topic covered in the chapter. We, the authors and instructors, neglect this most critical skill of knowing when or when not to use a technique. Yet, technique problems are essential to learning the mechanics of a given statistical technique. In this text, we provide these types of problems at the end of each chapter. We additionally provide several small and large data sets at the end of the book. We strongly suggest that the instructor use these data sets with one or more chapters. And, yes, instructors, please assign another data set after a chapter when it is not appropriate to use that chapter’s technique with the data set that we have provided. As you repeatedly assign the data sets, a statistical story of the data develops. Additionally, we provide several large data sets that are appropriate for semester-long projects that use one or more of the book’s techniques.

Is This Book for You?

This text is written for students at the undergraduate or graduate level. Most textbooks written on statistical techniques for multivariate and data mining/predictive analytics are written at a level that is so mathematical or so non-technical that the reader remains unable to apply the technique. We believe that our book is at the level right in between—enough mathematics so that you understand what is going on, how and when to apply the technique, and how to interpret the output. This book is not a software manual; we do not cover every option for every method. Rather, we introduce the reader to the basic concepts necessary to understand each method.

Our goal is to make the reader an educated consumer who can develop the statistical story of a multivariate data set by learning the techniques of multivariate and data mining/predictive analytics and by developing the skill to understand the statistical story.

Prerequisites

The text assumes that the reader has taken a basic introductory statistics course. One chapter reviews the fundamental concepts that you should understand from an introductory statistics course.

Software Used to Develop the Book’s Content

The primary software application used in this book is JMP statistical software, in particular JMP 10 and JMP 10 Pro. The book offers new and enhanced resources in JMP 10, including an add-in to Microsoft Excel, Graph Builder, and data mining/predictive analytics modeling capabilities. To provide a good foundation, some of the early examples use Microsoft Excel.

Scope of This Book

The book starts with a review of basic statistics and expands on some of these concepts to include multivariate techniques. Several multivariate techniques are discussed (principal components, cluster analysis, ANOVA, multiple regression, and logistic regression). In introducing each technique, we provide a basic statistical foundation so that the reader understands when to use the technique and how to evaluate and interpret the results. Additionally, step-by-step directions are provided to guide you through an analysis using the technique. Similarly, the last few chapters of the book introduce a few more automated predictive modeling/data mining techniques (decision trees and neural networks) and concepts.

Example Code and Data

You can access the example code and data for this book by linking to its author pages at http://support.sas.com/publishing/authors. Select the name of the author. Then, look for the cover thumbnail of this book, and select Example Code and Data to display the data that are included in this book.

For an alphabetical list of all books for which example code and data is available, see http://support.sas.com/bookcode. Select a title to display the book’s example data.

If you are unable to access the code through the website, send e-mail to [email protected].

Additional Resources

SAS offers you a rich variety of resources to help build your SAS skills and explore and apply the full power of SAS software. Whether you are in a professional or academic setting, we have learning products that can help you maximize your investment in SAS.

Bookstore

http://support.sas.com/bookstore/

Training

http://support.sas.com/training/

Certification

http://support.sas.com/certify/

SAS Global Academic Program

http://support.sas.com/learn/ap/

SAS OnDemand

http://support.sas.com/learn/ondemand/

Or

Knowledge Base

http://support.sas.com/resources/

Support

http://support.sas.com/techsup/

Training and Bookstore

http://support.sas.com/learn/

Community

http://support.sas.com/community/

Keep in Touch

We look forward to hearing from you. We invite questions, comments, and concerns. If you want to contact us about a specific book, please include the book title in your correspondence.

To Contact the Author through SAS Press

By e-mail: [email protected]

Via the Web: http://support.sas.com/author_feedback

SAS Books

For a complete list of books available through SAS, visit http://support.sas.com/bookstore.

Phone: 1-800-727-3228

Fax: 1-919-677-8166

E-mail: [email protected]

SAS Book Report

Receive up-to-date information about all new SAS publications via e-mail by subscribing to the SAS Book Report monthly eNewsletter. Visit http://support.sas.com/sbr.

(Endnotes)

1  Throughout the text, we will use the terms data mining, predictive analytics, and predictive modeling interchangeably. In Chapters 1 and 11, we briefly discuss their differences.

2  Manyika, J., M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. H. Byers. 2011. Big data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.254.44