0%

Book Description

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python presents an applied approach to data mining concepts and methods, using Python software for illustration

Readers will learn how to implement a variety of popular data mining algorithms in Python (a free and open-source software) to tackle business problems and opportunities.

This is the sixth version of this successful text, and the first using Python. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes:

  • A new co-author, Peter Gedeck, who brings both experience teaching business analytics courses using Python, and expertise in the application of machine learning methods to the drug-discovery process
  • A new section on ethical issues in data mining
  • Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students
  • More than a dozen case studies demonstrating applications for the data mining techniques described
  • End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented
  • A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology.

“This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject.”

—Gareth M. James, University of Southern California and co-author (with Witten, Hastie and Tibshirani) of the best-selling book An Introduction to Statistical Learning, with Applications in R 

Table of Contents

  1. Cover
  2. Foreword by Gareth James
  3. Foreword by Ravi Bapna
  4. Preface to the Python Edition
  5. Acknowledgments
  6. Part I Preliminaries
    1. Chapter 1 Introduction
      1. 1.1 What Is Business Analytics?
      2. 1.2 What Is Data Mining?
      3. 1.3 Data Mining and Related Terms
      4. 1.4 Big Data
      5. 1.5 Data Science
      6. 1.6 Why Are There So Many Different Methods?
      7. 1.7 Terminology and Notation
      8. 1.8 Road Maps to This Book
    2. Chapter 2 Overview of the Data Mining Process
      1. 2.1 Introduction
      2. 2.2 Core Ideas in Data Mining
      3. 2.3 The Steps in Data Mining
      4. 2.4 Preliminary Steps
      5. 2.5 Predictive Power and Overfitting
      6. 2.6 Building a Predictive Model
      7. 2.7 Using Python for Data Mining on a Local Machine
      8. 2.8 Automating Data Mining Solutions
      9. 2.9 Ethical Practice in Data Mining5
      10. Problems
      11. Notes
  7. Part II Data Exploration and Dimension Reduction
    1. Chapter 3 Data Visualization
      1. 3.1 Introduction1
      2. 3.2 Data Examples
      3. 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots
      4. 3.4 Multidimensional Visualization
      5. 3.5 Specialized Visualizations
      6. 3.6 Summary: Major Visualizations and Operations, by Data Mining Goal
      7. Problems
      8. Notes
    2. Chapter 4 Dimension Reduction
      1. 4.1 Introduction
      2. 4.2 Curse of Dimensionality
      3. 4.3 Practical Considerations
      4. 4.4 Data Summaries
      5. 4.5 Correlation Analysis
      6. 4.6 Reducing the Number of Categories in Categorical Variables
      7. 4.7 Converting a Categorical Variable to a Numerical Variable
      8. 4.8 Principal Components Analysis
      9. 4.9 Dimension Reduction Using Regression Models
      10. 4.10 Dimension Reduction Using Classification and Regression Trees
      11. Problems
      12. Notes
  8. Part III Performance Evaluation
    1. Chapter 5 Evaluating Predictive Performance
      1. 5.1 Introduction
      2. 5.2 Evaluating Predictive Performance
      3. 5.3 Judging Classifier Performance
      4. 5.4 Judging Ranking Performance
      5. 5.5 Oversampling
      6. Problems
      7. Notes
  9. Part IV Prediction and Classification Methods
    1. Chapter 6 Multiple Linear Regression
      1. 6.1 Introduction
      2. 6.2 Explanatory vs. Predictive Modeling
      3. 6.3 Estimating the Regression Equation and Prediction
      4. 6.4 Variable Selection in Linear Regression
      5. Appendix: Using Statmodels
      6. Problems
    2. Chapter 7 k-Nearest Neighbors (k-NN)
      1. 7.1 The k-NN Classifier (Categorical Outcome)
      2. 7.2 k-NN for a Numerical Outcome
      3. 7.3 Advantages and Shortcomings of k-NN Algorithms
      4. Problems
      5. Notes
    3. Chapter 8 The Naive Bayes Classifier
      1. 8.1 Introduction
      2. 8.2 Applying the Full (Exact) Bayesian Classifier
      3. 8.3 Advantages and Shortcomings of the Naive Bayes Classifier
      4. Problems
    4. Chapter 9 Classification and Regression Trees
      1. 9.1 Introduction
      2. 9.2 Classification Trees
      3. 9.3 Evaluating the Performance of a Classification Tree
      4. 9.4 Avoiding Overfitting
      5. 9.5 Classification Rules from Trees
      6. 9.6 Classification Trees for More Than Two Classes
      7. 9.7 Regression Trees
      8. 9.8 Improving Prediction: Random Forests and Boosted Trees
      9. 9.9 Advantages and Weaknesses of a Tree
      10. Problems
      11. Notes
    5. Chapter 10 Logistic Regression
      1. 10.1 Introduction
      2. 10.2 The Logistic Regression Model
      3. 10.3 Example: Acceptance of Personal Loan
      4. 10.4 Evaluating Classification Performance
      5. 10.5 Logistic Regression for Multi-class Classification
      6. 10.6 Example of Complete Analysis: Predicting Delayed Flights
      7. Appendix: Using Statmodels
      8. Problems
      9. Notes
    6. Chapter 11 Neural Nets
      1. 11.1 Introduction
      2. 11.2 Concept and Structure of a Neural Network
      3. 11.3 Fitting a Network to Data
      4. 11.4 Required User Input
      5. 11.5 Exploring the Relationship Between Predictors and Outcome
      6. 11.6 Deep Learning3
      7. 11.7 Advantages and Weaknesses of Neural Networks
      8. Problems
      9. Notes
    7. Chapter 12 Discriminant Analysis
      1. 12.1 Introduction
      2. 12.2 Distance of a Record from a Class
      3. 12.3 Fisher’s Linear Classification Functions
      4. 12.4 Classification Performance of Discriminant Analysis
      5. 12.5 Prior Probabilities
      6. 12.6 Unequal Misclassification Costs
      7. 12.7 Classifying More Than Two Classes
      8. 12.8 Advantages and Weaknesses
      9. Problems
      10. Notes
    8. Chapter 13 Combining Methods: Ensembles and Uplift Modeling
      1. 13.1 Ensembles1
      2. 13.2 Uplift (Persuasion) Modeling
      3. 13.3 Summary
      4. Problems
      5. Notes
  10. Part V Mining Relationships Among Records
    1. Chapter 14 Association Rules and Collaborative Filtering
      1. 14.1 Association Rules
      2. 14.2 Collaborative Filtering
      3. 14.3 Summary
      4. Problems
      5. Notes
    2. Chapter 15 Cluster Analysis
      1. 15.1 Introduction
      2. 15.2 Measuring Distance Between Two Records
      3. 15.3 Measuring Distance Between Two Clusters
      4. 15.4 Hierarchical (Agglomerative) Clustering
      5. 15.5 Non-Hierarchical Clustering: The k-Means Algorithm
      6. Problems
  11. Part VI Forecasting Time Series
    1. Chapter 16 Handling Time Series
      1. 16.1 Introduction1
      2. 16.2 Descriptive vs. Predictive Modeling
      3. 16.3 Popular Forecasting Methods in Business
      4. 16.4 Time Series Components
      5. 16.5 Data-Partitioning and Performance Evaluation
      6. Problems
      7. Notes
    2. Chapter 17 Regression-Based Forecasting
      1. 17.1 A Model with Trend1
      2. 17.2 A Model with Seasonality
      3. 17.3 A Model with Trend and Seasonality
      4. 17.4 Autocorrelation and ARIMA Models
      5. Problems
      6. Notes
    3. Chapter 18 Smoothing Methods
      1. 18.1 Introduction1
      2. 18.2 Moving Average
      3. 18.3 Simple Exponential Smoothing
      4. 18.4 Advanced Exponential Smoothing
      5. Problems
      6. Notes
  12. PART VII Data Analytics
    1. Chapter 19 Social Network Analytics1
      1. 19.1 Introduction2
      2. 19.2 Directed vs. Undirected Networks
      3. 19.3 Visualizing and Analyzing Networks
      4. 19.4 Social Data Metrics and Taxonomy
      5. 19.5 Using Network Metrics in Prediction and Classification
      6. 19.6 Collecting Social Network Data with Python
      7. 19.7 Advantages and Disadvantages
      8. Problems
      9. Notes
    2. Chapter 20 Text Mining
      1. 20.1 Introduction1
      2. 20.2 The Tabular Representation of Text: Term-Document Matrix and “Bag-of-Words”
      3. 20.3 Bag-of-Words vs. Meaning Extraction at Document Level
      4. 20.4 Preprocessing the Text
      5. 20.5 Implementing Data Mining Methods
      6. 20.6 Example: Online Discussions on Autos and Electronics
      7. 20.7 Summary
      8. Problems
      9. Notes
  13. PART VIII Cases
    1. Chapter 21 Cases
      1. 21.1 Charles Book Club1
      2. 21.2 German Credit
      3. 21.3 Tayko Software Cataloger3
      4. 21.4 Political Persuasion4
      5. 21.5 Taxi Cancellations5
      6. 21.6 Segmenting Consumers of Bath Soap6
      7. 21.7 Direct-Mail Fundraising
      8. 21.8 Catalog Cross-Selling7
      9. 21.9 Time Series Case: Forecasting Public Transportation Demand
      10. Notes
  14. References
  15. Data Files Used in the Book
  16. Python Utilities Functions
  17. Index
  18. End User License Agreement
3.138.101.95