0%

Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you’ll find in the real world.

Table of Contents

  1. Pandas in Action
  2. Dedication
  3. Copyright
  4. contents
  5. front matter
    1. preface
    2. acknowledgments
    3. about this book
    4. Who should read this book
    5. How this book is organized: A road map
    6. About the code
    7. liveBook discussion forum
    8. Other online resources
    9. about the author
    10. about the cover illustration
  6. Part 1. Core pandas
  7. 1 Introducing pandas
    1. 1.1 Data in the 21st century
    2. 1.2 Introducing pandas
    3. 1.2.1 Pandas vs. graphical spreadsheet applications
    4. 1.2.2 Pandas vs. its competitors
    5. 1.3 A tour of pandas
    6. 1.3.1 Importing a data set
    7. 1.3.2 Manipulating a DataFrame
    8. 1.3.3 Counting values in a Series
    9. 1.3.4 Filtering a column by one or more criteria
    10. 1.3.5 Grouping data
    11. Summary
  8. 2 The Series object
    1. 2.1 Overview of a Series
    2. 2.1.1 Classes and instances
    3. 2.1.2 Populating the Series with values
    4. 2.1.3 Customizing the Series index
    5. 2.1.4 Creating a Series with missing values
    6. 2.2 Creating a Series from Python objects
    7. 2.3 Series attributes
    8. 2.4 Retrieving the first and last rows
    9. 2.5 Mathematical operations
    10. 2.5.1 Statistical operations
    11. 2.5.2 Arithmetic operations
    12. 2.5.3 Broadcasting
    13. 2.6 Passing the Series to Python’s built-in functions
    14. 2.7 Coding challenge
    15. 2.7.1 Problems
    16. 2.7.2 Solutions
    17. Summary
  9. 3 Series methods
    1. 3.1 Importing a data set with the read_csv function
    2. 3.2 Sorting a Series
    3. 3.2.1 Sorting by values with the sort_values method
    4. 3.2.2 Sorting by index with the sort_index method
    5. 3.2.3 Retrieving the smallest and largest values with the nsmallest and nlargest methods
    6. 3.3 Overwriting a Series with the inplace parameter
    7. 3.4 Counting values with the value_counts method
    8. 3.5 Invoking a function on every Series value with the apply method
    9. 3.6 Coding challenge
    10. 3.6.1 Problems
    11. 3.6.2 Solutions
    12. Summary
  10. 4 The DataFrame object
    1. 4.1 Overview of a DataFrame
    2. 4.1.1 Creating a DataFrame from a dictionary
    3. 4.1.2 Creating a DataFrame from a NumPy ndarray
    4. 4.2 Similarities between Series and DataFrames
    5. 4.2.1 Importing a DataFrame with the read_csv function
    6. 4.2.2 Shared and exclusive attributes of Series and DataFrames
    7. 4.2.3 Shared methods of Series and DataFrames
    8. 4.3 Sorting a DataFrame
    9. 4.3.1 Sorting by a single column
    10. 4.3.2 Sorting by multiple columns
    11. 4.4 Sorting by index
    12. 4.4.1 Sorting by row index
    13. 4.4.2 Sorting by column index
    14. 4.5 Setting a new index
    15. 4.6 Selecting columns and rows from a DataFrame
    16. 4.6.1 Selecting a single column from a DataFrame
    17. 4.6.2 Selecting multiple columns from a DataFrame
    18. 4.7 Selecting rows from a DataFrame
    19. 4.7.1 Extracting rows by index label
    20. 4.7.2 Extracting rows by index position
    21. 4.7.3 Extracting values from specific columns
    22. 4.8 Extracting values from Series
    23. 4.9 Renaming columns or rows
    24. 4.10 Resetting an index
    25. 4.11 Coding challenge
    26. 4.11.1 Problems
    27. 4.11.2 Solutions
    28. Summary
  11. 5 Filtering a DataFrame
    1. 5.1 Optimizing a data set for memory use
    2. 5.1.1 Converting data types with the astype method
    3. 5.2 Filtering by a single condition
    4. 5.3 Filtering by multiple conditions
    5. 5.3.1 The AND condition
    6. 5.3.2 The OR condition
    7. 5.3.3 Inversion with ~
    8. 5.3.4 Methods for Booleans
    9. 5.4 Filtering by condition
    10. 5.4.1 The isin method
    11. 5.4.2 The between method
    12. 5.4.3 The isnull and notnull methods
    13. 5.4.4 Dealing with null values
    14. 5.5 Dealing with duplicates
    15. 5.5.1 The duplicated method
    16. 5.5.2 The drop_duplicates method
    17. 5.6 Coding challenge
    18. 5.6.1 Problems
    19. 5.6.2 Solutions
    20. Summary
  12. Part 2. Applied pandas
  13. 6 Working with text data
    1. 6.1 Letter casing and whitespace
    2. 6.2 String slicing
    3. 6.3 String slicing and character replacement
    4. 6.4 Boolean methods
    5. 6.5 Splitting strings
    6. 6.6 Coding challenge
    7. 6.6.1 Problems
    8. 6.6.2 Solutions
    9. 6.7 A note on regular expressions
    10. Summary
  14. 7 MultiIndex DataFrames
    1. 7.1 The MultiIndex object
    2. 7.2 MultiIndex DataFrames
    3. 7.3 Sorting a MultiIndex
    4. 7.4 Selecting with a MultiIndex
    5. 7.4.1 Extracting one or more columns
    6. 7.4.2 Extracting one or more rows with loc
    7. 7.4.3 Extracting one or more rows with iloc
    8. 7.5 Cross-sections
    9. 7.6 Manipulating the Index
    10. 7.6.1 Resetting the index
    11. 7.6.2 Setting the index
    12. 7.7 Coding challenge
    13. 7.7.1 Problems
    14. 7.7.2 Solutions
    15. Summary
  15. 8 Reshaping and pivoting
    1. 8.1 Wide vs. narrow data
    2. 8.2 Creating a pivot table from a DataFrame
    3. 8.2.1 The pivot_table method
    4. 8.2.2 Additional options for pivot tables
    5. 8.3 Stacking and unstacking index levels
    6. 8.4 Melting a data set
    7. 8.5 Exploding a list of values
    8. 8.6 Coding challenge
    9. 8.6.1 Problems
    10. 8.6.2 Solutions
    11. Summary
  16. 9 The GroupBy object
    1. 9.1 Creating a GroupBy object from scratch
    2. 9.2 Creating a GroupBy object from a data set
    3. 9.3 Attributes and methods of a GroupBy object
    4. 9.4 Aggregate operations
    5. 9.5 Applying a custom operation to all groups
    6. 9.6 Grouping by multiple columns
    7. 9.7 Coding challenge
    8. 9.7.1 Problems
    9. 9.7.2 Solutions
    10. Summary
  17. 10 Merging, joining, and concatenating
    1. 10.1 Introducing the data sets
    2. 10.2 Concatenating data sets
    3. 10.3 Missing values in concatenated DataFrames
    4. 10.4 Left joins
    5. 10.5 Inner joins
    6. 10.6 Outer joins
    7. 10.7 Merging on index labels
    8. 10.8 Coding challenge
    9. 10.8.1 Problems
    10. 10.8.2 Solutions
    11. Summary
  18. 11 Working with dates and times
    1. 11.1 Introducing the Timestamp object
    2. 11.1.1 How Python works with datetimes
    3. 11.1.2 How pandas works with datetimes
    4. 11.2 Storing multiple timestamps in a DatetimeIndex
    5. 11.3 Converting column or index values to datetimes
    6. 11.4 Using the DatetimeProperties object
    7. 11.5 Adding and subtracting durations of time
    8. 11.6 Date offsets
    9. 11.7 The Timedelta object
    10. 11.8 Coding challenge
    11. 11.8.1 Problems
    12. 11.8.2 Solutions
    13. Summary
  19. 12 Imports and exports
    1. 12.1 Reading from and writing to JSON files
    2. 12.1.1 Loading a JSON file Into a DataFrame
    3. 12.1.2 Exporting a DataFrame to a JSON file
    4. 12.2 Reading from and writing to CSV files
    5. 12.3 Reading from and writing to Excel workbooks
    6. 12.3.1 Installing the xlrd and openpyxl libraries in an Anaconda environment
    7. 12.3.2 Importing Excel workbooks
    8. 12.3.3 Exporting Excel workbooks
    9. 12.4 Coding challenge
    10. 12.4.1 Problems
    11. 12.4.2 Solutions
    12. Summary
  20. 13 Configuring pandas
    1. 13.1 Getting and setting pandas options
    2. 13.2 Precision
    3. 13.3 Maximum column width
    4. 13.4 Chop threshold
    5. 13.5 Option context
    6. Summary
  21. 14 Visualization
    1. 14.1 Installing matplotlib
    2. 14.2 Line charts
    3. 14.3 Bar graphs
    4. 14.4 Pie charts
    5. Summary
  22. Appendix A. Installation and setup
    1. A.1 The Anaconda distribution
    2. A.2 The macOS setup process
    3. A.2.1 Installing Anaconda in macOS
    4. A.2.2 Launching Terminal
    5. A.2.3 Common Terminal commands
    6. A.3 The Windows setup process
    7. A.3.1 Installing Anaconda in Windows
    8. A.3.2 Launching Anaconda Prompt
    9. A.3.3 Common Anaconda Prompt commands
    10. A.4 Creating a new Anaconda environment
    11. A.5 Anaconda Navigator
    12. A.6 The basics of Jupyter Notebook
  23. Appendix B. Python crash course
    1. B.1 Simple data types
    2. B.1.1 Numbers
    3. B.1.2 Strings
    4. B.1.3 Booleans
    5. B.1.4 The None object
    6. B.2 Operators
    7. B.2.1 Mathematical operators
    8. B.2.2 Equality and inequality operators
    9. B.3 Variables
    10. B.4 Functions
    11. B.4.1 Arguments and return values
    12. B.4.2 Custom functions
    13. B.5 Modules
    14. B.6 Classes and objects
    15. B.7 Attributes and methods
    16. B.8 String methods
    17. B.9 Lists
    18. B.9.1 List iteration
    19. B.9.2 List comprehension
    20. B.9.3 Converting a string to a list and vice versa
    21. B.10 Tuples
    22. B.11 Dictionaries
    23. B.11.1 Dictionary Iteration
    24. B.12 Sets
  24. Appendix C. NumPy crash course
    1. C.1 Dimensions
    2. C.2 The ndarray object
    3. C.2.1 Generating a numeric range with the arange method
    4. C.2.2 Attributes on a ndarray object
    5. C.2.3 The reshape method
    6. C.2.4 The randint function
    7. C.2.5 The randn function
    8. C.3 The nan object
  25. Appendix D. Generating fake data with Faker
    1. D.1 Installing Faker
    2. D.2 Getting started with Faker
    3. D.3 Populating a DataFrame with fake values
  26. Appendix E. Regular expressions
    1. E.1 Introduction to Python’s re module
    2. E.2 Metacharacters
    3. E.3 Advanced search patterns
    4. E.4 Regular expressions and pandas
  27. index
3.141.41.187