Contents

  1. Foreword to Second Edition

  2. Foreword to First Edition

  3. Preface

    1. Breakdown of the Book

      1. Part I

      2. Part II

      3. Part III

      4. Part IV

      5. Part V

      6. Appendices

    2. How to Read This Book

      1. Newcomers

      2. Fluent Python Programmers

      3. Instructors

    3. Setup

      1. Get the Data

      2. Setup Python

    4. Feedback, Please!

  4. Acknowledgments

  5. About the Author

  6. Changes in the Second Edition

  7. I Introduction

    1. 1 Pandas DataFrame Basics

      1. 1.1 Introduction

      2. Learning Objectives

      3. 1.2 Load Your First Data Set

      4. 1.3 Look at Columns, Rows, and Cells

        1. 1.3.1 Select and Subset Columns by Name

        2. 1.3.2 Subset Rows

        3. 1.3.3 Subset Rows by Row Number: .iloc[]

        4. 1.3.4 Mix It Up

        5. 1.3.5 Subsetting Rows and Columns

      5. 1.4 Grouped and Aggregated Calculations

        1. 1.4.1 Grouped Means

        2. 1.4.2 Grouped Frequency Counts

      6. 1.5 Basic Plot

      7. Conclusion

    2. 2 Pandas Data Structures Basics

      1. Learning Objectives

      2. 2.1 Create Your Own Data

        1. 2.1.1 Create a Series

        2. 2.1.2 Create a DataFrame

      3. 2.2 The Series

        1. 2.2.1 The Series Is ndarray-like

        2. 2.2.2 Boolean Subsetting: Series

        3. 2.2.3 Operations Are Automatically Aligned and Vectorized (Broadcasting)

      4. 2.3 The DataFrame

        1. 2.3.1 Parts of a DataFrame

        2. 2.3.2 Boolean Subsetting: DataFrames

        3. 2.3.3 Operations Are Automatically Aligned and Vectorized (Broadcasting)

      5. 2.4 Making Changes to Series and DataFrames

        1. 2.4.1 Add Additional Columns

        2. 2.4.2 Directly Change a Column

        3. 2.4.3 Modifying Columns with .assign()

        4. 2.4.4 Dropping Values

      6. 2.5 Exporting and Importing Data

        1. 2.5.1 Pickle

        2. 2.5.2 Comma-Separated Values (CSV)

        3. 2.5.3 Excel

        4. 2.5.4 Feather

        5. 2.5.5 Arrow

        6. 2.5.6 Dictionary

        7. 2.5.7 JSON (JavaScript Objectd Notation)

        8. 2.5.8 Other Data Output Types

      7. Conclusion

    3. 3 Plotting Basics

      1. Learning Objectives

      2. 3.1 Why Visualize Data?

      3. 3.2 Matplotlib Basics

        1. 3.2.1 Figure Objects and Axes Subplots

        2. 3.2.2 Anatomy of a Figure

      4. 3.3 Statistical Graphics Using matplotlib

        1. 3.3.1 Univariate (Single Variable)

        2. 3.3.2 Bivariate (Two Variables)

        3. 3.3.3 Multivariate Data

      5. 3.4 Seaborn

        1. 3.4.1 Univariate

        2. 3.4.2 Bivariate Data

        3. 3.4.3 Multivariate Data

        4. 3.4.4 Facets

        5. 3.4.5 Seaborn Styles and Themes

        6. 3.4.6 How to Go Through Seaborn Documentation

        7. 3.4.7 Next-Generation Seaborn Interface

      6. 3.5 Pandas Plotting Method

        1. 3.5.1 Histogram

        2. 3.5.2 Density Plot

        3. 3.5.3 Scatter Plot

        4. 3.5.4 Hexbin Plot

        5. 3.5.5 Box Plot

      7. Conclusion

    4. 4 Tidy Data

      1. Learning Objectives

        1. Note About This Chapter

      2. 4.1 Columns Contain Values, Not Variables

        1. 4.1.1 Keep One Column Fixed

        2. 4.1.2 Keep Multiple Columns Fixed

      3. 4.2 Columns Contain Multiple Variables

        1. 4.2.1 Split and Add Columns Individually

        2. 4.2.2 Split and Combine in a Single Step

      4. 4.3 Variables in Both Rows and Columns

      5. Conclusion

    5. 5 Apply Functions

      1. Learning Objectives

        1. Note About This Chapter

      2. 5.1 Primer on Functions

      3. 5.2 Apply (Basics)

        1. 5.2.1 Apply Over a Series

        2. 5.2.2 Apply Over a DataFrame

      4. 5.3 Vectorized Functions

        1. 5.3.1 Vectorize with NumPy

        2. 5.3.2 Vectorize with Numba

      5. 5.4 Lambda Functions (Anonymous Functions)

      6. Conclusion

  8. II Data Processing

    1. 6 Data Assembly

      1. Learning Objectives

      2. 6.1 Combine Data Sets

      3. 6.2 Concatenation

        1. 6.2.1 Review Parts of a DataFrame

        2. 6.2.2 Add Rows

        3. 6.2.3 Add Columns

        4. 6.2.4 Concatenate with Different Indices

      4. 6.3 Observational Units Across Multiple Tables

        1. 6.3.1 Load Multiple Files Using a Loop

        2. 6.3.2 Load Multiple Files Using a List Comprehension

      5. 6.4 Merge Multiple Data Sets

        1. 6.4.1 One-to-One Merge

        2. 6.4.2 Many-to-One Merge

        3. 6.4.3 Many-to-Many Merge

        4. 6.4.4 Check Your Work with Assert

      6. Conclusion

    2. 7 Data Normalization

      1. Learning Objectives

      2. 7.1 Multiple Observational Units in a Table (Normalization)

      3. Conclusion

    3. 8 Groupby Operations: Split-Apply-Combine

      1. Learning Objectives

      2. 8.1 Aggregate

        1. 8.1.1 Basic One-Variable Grouped Aggregation

        2. 8.1.2 Built-In Aggregation Methods

        3. 8.1.3 Aggregation Functions

        4. 8.1.4 Multiple Functions Simultaneously

        5. 8.1.5 Use a dict in .agg() / .aggregate()

      3. 8.2 Transform

        1. 8.2.1 Z-Score Example

        2. 8.2.2 Missing Value Example

      4. 8.3 Filter

      5. 8.4 The pandas.core.groupby. DataFrameGroupBy object

        1. 8.4.1 Groups

        2. 8.4.2 Group Calculations Involving Multiple Variables

        3. 8.4.3 Selecting a Group

        4. 8.4.4 Iterating Through Groups

        5. 8.4.5 Multiple Groups

        6. 8.4.6 Flattening the Results (.reset_index())

      6. 8.5 Working With a MultiIndex

      7. Conclusion

  9. III Data Types

    1. 9 Missing Data

      1. Learning Objectives

      2. 9.1 What Is a NaN Value?

      3. 9.2 Where Do Missing Values Come From?

        1. 9.2.1 Load Data

        2. 9.2.2 Merged Data

        3. 9.2.3 User Input Values

        4. 9.2.4 Reindexing

      4. 9.3 Working With Missing Data

        1. 9.3.1 Find and Count Missing Data

        2. 9.3.2 Clean Missing Data

        3. 9.3.3 Calculations With Missing Data

      5. 9.4 Pandas Built-In NA Missing

      6. Conclusion

    2. 10 Data Types

      1. Learning Objectives

      2. 10.1 Data Types

      3. 10.2 Converting Types

        1. 10.2.1 Converting to String Objects

        2. 10.2.2 Converting to Numeric Values

      4. 10.3 Categorical Data

        1. 10.3.1 Convert to Category

        2. 10.3.2 Manipulating Categorical Data

      5. Conclusion

    3. 11 Strings and Text Data

      1. Introduction

      2. Learning Objectives

      3. 11.1 Strings

        1. 11.1.1 Subset and Slice Strings

        2. 11.1.2 Get the Last Character in a String

      4. 11.2 String Methods

      5. 11.3 More String Methods

        1. 11.3.1 Join

        2. 11.3.2 Splitlines

      6. 11.4 String Formatting (F-Strings)

        1. 11.4.1 Formatting Numbers

      7. 11.5 Regular Expressions (RegEx)

        1. 11.5.1 Match a Pattern

        2. 11.5.2 Remember What Your RegEx Patterns Are

        3. 11.5.3 Find a Pattern

        4. 11.5.4 Substitute a Pattern

        5. 11.5.5 Compile a Pattern

      8. 11.6 The regex Library

      9. Conclusion

    4. 12 Dates and Times

      1. Learning Objectives

      2. 12.1 Python’s datetime Object

      3. 12.2 Converting to datetime

      4. 12.3 Loading Data That Include Dates

      5. 12.4 Extracting Date Components

      6. 12.5 Date Calculations and Timedeltas

      7. 12.6 Datetime Methods

      8. 12.7 Getting Stock Data

      9. 12.8 Subsetting Data Based on Dates

        1. 12.8.1 The DatetimeIndex Object

        2. 12.8.2 The TimedeltaIndex Object

      10. 12.9 Date Ranges

        1. 12.9.1 Frequencies

        2. 12.9.2 Offsets

      11. 12.10 Shifting Values

      12. 12.11 Resampling

      13. 12.12 Time Zones

      14. 12.13 Arrow for Better Dates and Times

      15. Conclusion

  10. IV Data Modeling

    1. 13 Linear Regression (Continuous Outcome Variable)

      1. 13.1 Simple Linear Regression

        1. 13.1.1 With statsmodels

        2. 13.1.2 With scikit-learn

      2. 13.2 Multiple Regression

        1. 13.2.1 With statsmodels

        2. 13.2.2 With scikit-learn

      3. 13.3 Models with Categorical Variables

        1. 13.3.1 Categorical Variables in statsmodels

        2. 13.3.2 Categorical Variables in scikit-learn

      4. 13.4 One-Hot Encoding in scikit-learn with Transformer Pipelines

      5. Conclusion

    2. 14 Generalized Linear Models

      1. About This Chapter

      2. 14.1 Logistic Regression (Binary Outcome Variable)

        1. 14.1.1 With statsmodels

        2. 14.1.2 With sklearn

        3. 14.1.3 Be Careful of scikit-learn Defaults

      3. 14.2 Poisson Regression (Count Outcome Variable)

        1. 14.2.1 With statsmodels

        2. 14.2.2 Negative Binomial Regression for Overdispersion

      4. 14.3 More Generalized Linear Models

      5. Conclusion

    3. 15 Survival Analysis

      1. 15.1 Survival Data

      2. 15.2 Kaplan Meier Curves

      3. 15.3 Cox Proportional Hazard Model

        1. 15.3.1 Testing the Cox Model Assumptions

      4. Conclusion

    4. 16 Model Diagnostics

      1. 16.1 Residuals

        1. 16.1.1 Q-Q Plots

      2. 16.2 Comparing Multiple Models

        1. 16.2.1 Working with Linear Models

        2. 16.2.2 Working with GLM Models

      3. 16.3 k-Fold Cross-Validation

      4. Conclusion

    5. 17 Regularization

      1. 17.1 Why Regularize?

      2. 17.2 LASSO Regression

      3. 17.3 Ridge Regression

      4. 17.4 Elastic Net

      5. 17.5 Cross-Validation

      6. Conclusion

    6. 18 Clustering

      1. 18.1 k-Means

        1. 18.1.1 Dimension Reduction with PCA

      2. 18.2 Hierarchical Clustering

        1. 18.2.1 Complete Clustering

        2. 18.2.2 Single Clustering

        3. 18.2.3 Average Clustering

        4. 18.2.4 Centroid Clustering

        5. 18.2.5 Ward Clustering

        6. 18.2.6 Manually Setting the Threshold

      3. Conclusion

  11. V Conclusion

    1. 19 Life Outside of Pandas

      1. 19.1 The (Scientific) Computing Stack

      2. 19.2 Performance

        1. 19.2.1 Timing Your Code

        2. 19.2.2 Profiling Your Code

        3. 19.2.3 Concurrent Futures

      3. 19.3 Dask

      4. 19.4 Siuba

      5. 19.5 Ibis

      6. 19.6 Polars

      7. 19.7 PyJanitor

      8. 19.8 Pandera

      9. 19.9 Machine Learning

      10. 19.10 Publishing

      11. 19.11 Dashboards

      12. Conclusion

    2. 20 It’s Dangerous To Go Alone!

      1. 20.1 Local Meetups

      2. 20.2 Conferences

      3. 20.3 The Carpentries

      4. 20.4 Podcasts

      5. 20.5 Other Resources

      6. Conclusion

  12. VI Appendices

    1. A Concept Maps

    2. B Installation and Setup

      1. B.1 Install Python

        1. B.1.1 Anaconda

        2. B.1.2 Miniconda

        3. B.1.3 Uninstall Anaconda or Miniconda

        4. B.1.4 Pyenv

      2. B.2 Install Python Packages

      3. B.3 Download Book Data

    3. C Command Line

      1. C.1 Installation

        1. C.1.1 Windows

        2. C.1.2 Mac

        3. C.1.3 Linux

      2. C.2 Basics

    4. D Project Templates

    5. E Using Python

      1. E.1 Command Line and Text Editor

      2. E.2 Python and IPython

      3. E.3 Jupyter

      4. E.4 Integrated Development Environments (IDEs)

    6. F Working Directories

    7. G Environments

      1. G.1 Conda Environments

      2. G.2 Pyenv + Pipenv

    8. H Install Packages

      1. H.1 Updating Packages

    9. I Importing Libraries

    10. J Code Style

      1. J.1 Line Breaks in Code

    11. K Containers: Lists, Tuples, and Dictionaries

      1. K.1 Lists

      2. K.2 Tuples

      3. K.3 Dictionaries

    12. L Slice Values

    13. M Loops

    14. N Comprehensions

    15. O Functions

      1. O.1 Default Parameters

      2. O.2 Arbitrary Parameters

        1. O.2.1 *args

        2. O.2.2 **kwargs

    16. P Ranges and Generators

    17. Q Multiple Assignment

    18. R Numpy ndarray

    19. S Classes

    20. T SettingWithCopyWarning

      1. T.1 Modifying a Subset of Data

      2. T.2 Replacing a Value

      3. T.3 More Resources

    21. U Method Chaining

    22. V Timing Code

    23. W String Formatting

      1. W.1 C-Style

      2. W.2 String Formatting: .format() Method

      3. W.3 Formatting Numbers

    24. X Conditionals (if-elif-else)

    25. Y New York ACS Logistic Regression Example

      1. Y.0.1 With sklearn

    26. Z Replicating Results in R

      1. Z.1 Linear Regression

      2. Z.2 Logistic Regression

      3. Z.3 Poisson Regression

        1. Z.3.1 Negative Binomial Regression for Overdispersion

  13. Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.130.227