0%

This book is dedicated to data preparation and explains how to perform different data preparation techniques on various datasets using different data preparation libraries written in the Python programming language.

Key Features

  • A crash course in Python to fill any gaps in prerequisite knowledge and a solid foundation on which to build your new skills
  • A complete data preparation pipeline for your guided practice
  • Three real-world projects covering each major task to cement your learned skills in data preparation, classification, and regression

Book Description

The book follows a straightforward approach. It is divided into nine chapters. Chapter 1 introduces the basic concept of data preparation and installation steps for the software that we will need to perform data preparation in this book. Chapter 1 also contains a crash course on Python, followed by a brief overview of different data types in Chapter 2. You will then learn how to handle missing values in the data, while the categorical encoding of numeric data is explained in Chapter 4.

The second half of the course presents data discretization and describes the handling of outliers' process. Chapter 7 demonstrates how to scale features in the dataset. Subsequent chapters teach you to handle mixed and DateTime data type, balance data, and practice resampling. A full data preparation final project is also available at the end of the book.

Different types of data preprocessing techniques have been explained theoretically, followed by practical examples in each chapter. Each chapter also contains an exercise that students can use to evaluate their understanding of the chapter's concepts. By the end of this course, you will have built a solid working knowledge in data preparation-the first steps to any data science or machine learning career and an essential skillset for any aspiring developer.

The code bundle for this course is available at https://www.aispublishing.net/book-data-preprocessing

What you will learn

  • Explore different libraries for data preparation
  • Understand data types
  • Handle missing data
  • Encode categorical data
  • Discretize data
  • Learn to handle outliers
  • Practice feature scaling
  • Handle mixed and DateTime variables and imbalanced datasets
  • Employ your new skills to complete projects in data preparation, classification, and regression

Who this book is for

In addition to beginners in data preparation with Python, this book can also be used as a reference manual by intermediate and experienced programmers. It contains data preprocessing code samples using multiple data visualization libraries.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Copyright
  4. How to contact us
  5. About the Publisher
  6. AI Publishing Is Searching for Authors Like You
  7. Download the Color Images
  8. Get in Touch with Us
  9. Table of Contents
  10. Preface
  11. About the Author
  12. Chapter 1: Introduction
    1. 1.1. What is Data Preparation?
    2. 1.2. Environment Setup
    3. 1.2.1. Windows Setup
    4. 1.2.2. Mac Setup
    5. 1.2.3. Linux Setup
    6. 1.3. Python Crash Course
    7. 1.3.1. Writing Your First Program
    8. 1.3.2. Python Variables and Data Types
    9. 1.3.3. Python Operators
    10. 1.3.4. Conditional Statements
    11. 1.3.5. Iteration Statements
    12. 1.3.6. Functions
    13. 1.3.7. Objects and Classes
    14. 1.4. Different Libraries for Data Preparation
    15. 1.4.1. NumPy
    16. 1.4.2. Scikit Learn
    17. 1.4.3. Matplotlib
    18. 1.4.4. Seaborn
    19. 1.4.5. Pandas
    20. Exercise 1.1
    21. Exercise 1.2
  13. Chapter 2: Understanding Data Types
    1. 2.1. Introduction
    2. 2.1.1. What Is a Variable?
    3. 2.1.2. Data Types
    4. 2.2. Numerical Data
    5. 2.2.1. Discrete Data
    6. 2.2.2. Continuous Data
    7. 2.2.3. Binary Data
    8. 2.3. Categorical Data
    9. 2.3.1. Ordinal Data
    10. 2.3.2. Nominal Data
    11. 2.4. Date and Time Data
    12. 2.5. Mixed Data Type
    13. 2.6. Missing Values
    14. 2.6.1. Causes of Missing Data
    15. 2.6.2. Disadvantages of Missing Data
    16. 2.6.3. Mechanism Behind Missing Values
    17. 2.7. Cardinality in Categorical Data
    18. 2.8. Probability Distribution
    19. 2.9. Outliers
    20. Exercise 2.1
  14. Chapter 3: Handling Missing Data
    1. 3.1. Introduction
    2. 3.2. Complete Case Analysis
    3. 3.3. Handling Missing Numerical Data
    4. 3.3.1. Mean or Median Imputation
    5. 3.3.2. End of Distribution Imputation
    6. 3.3.3. Arbitrary Value Imputation
    7. 3.4. Handling Missing Categorical Data
    8. 3.4.1. Frequent Category Imputation
    9. 3.4.2. Missing Category Imputation
    10. Exercise 3.1
    11. Exercise 3.2
  15. Chapter 4: Encoding Categorical Data
    1. 4.1. Introduction
    2. 4.2. One Hot Encoding
    3. 4.3. Label Encoding
    4. 4.4. Frequency Encoding
    5. 4.5. Ordinal Encoding
    6. 4.6. Mean Encoding
    7. Exercise 4.1
    8. Exercise 4.2
  16. Chapter 5: Data Discretization
    1. 5.1. Introduction
    2. 5.2. Equal Width Discretization
    3. 5.3. Equal Frequency Discretization
    4. 5.4. K-Means Discretization
    5. 5.5. Decision Tree Discretization
    6. 5.6. Custom Discretization
    7. Exercise 5.1
    8. Exercise 5.2
  17. Chapter 6: Outlier Handling
    1. 6.1. Introduction
    2. 6.2. Outlier Trimming
    3. 6.3. Outlier Capping Using IQR
    4. 6.4. Outlier Capping Using Mean and Std
    5. 6.5. Outlier Capping Using Quantiles
    6. 6.6. Outlier Capping using Custom Values
    7. Exercise 6.1
    8. Exercise 6.2
  18. Chapter 7: Feature Scaling
    1. 7.1. Introduction
    2. 7.2. Standardization
    3. 7.3. Min/Max Scaling
    4. 7.4. Mean Normalization
    5. 7.5. Maximum Absolute Scaling
    6. 7.6. Median and Quantile Scaling
    7. 7.7. Vector Unit Length Scaling
    8. Exercise 7.1
    9. Exercise 7.2
  19. Chapter 8: Handling Mixed and DateTime Variables
    1. 8.1. Introduction
    2. 8.2. Handling Mixed Values
    3. 8.3. Handling Date Data Type
    4. 8.4. Handling Time Data Type
    5. Exercise 8.1
    6. Exercise 8.2
  20. Chapter 9: Handling Imbalanced Datasets
    1. 9.1. Introduction
    2. 9.2. Imbalanced Dataset
    3. 9.3. Down Sampling
    4. 9.4. Up Sampling
    5. 9.5. SMOTE Up Sampling
    6. Exercise 9.1
  21. Final Project – A Complete Data Preparation Pipeline
    1. 1.1. Introduction
    2. 1.2. Data Preparation
    3. 1.3. Classification Project
    4. 1.4. Regression Project
  22. Exercise Solutions
    1. Exercise 2.1
    2. Exercise 3.1
    3. Exercise 3.2
    4. Exercise 4.1
    5. Exercise 4.2
    6. Exercise 5.1
    7. Exercise 5.2
    8. Exercise 6.1
    9. Exercise 6.2
    10. Exercise 7.1
    11. Exercise 7.2
    12. Exercise 8.1
    13. Exercise 8.2
    14. Exercise 9.1
  23. Back Cover
3.137.192.3