0%

Get to grips with pandas - a fast, versatile, and high-performance Python library for data discovery, data manipulation, data preparation, and handling data for analytical tasks

Key Features

  • Perform efficient data analysis and manipulation tasks using pandas 1.x
  • Apply pandas to different real-world domains with the help of step-by-step examples
  • Become well-versed in using pandas as an effective data exploration tool

Book Description

Data analysis has become an essential skill in a variety of domains where knowing how to work with data and extract insights can generate significant value. Hands-On Data Analysis with Pandas will show you how to analyze your data, get started with machine learning, and work effectively with the Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn.

Using real-world datasets, you will learn how to use the pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data.

This updated edition will equip you with the skills you need to use pandas 1.x to efficiently perform various data manipulation tasks, reliably reproduce analyses, and visualize your data for effective decision making—valuable knowledge that can be applied across multiple domains.

What you will learn

  • Understand how data analysts and scientists gather and analyze data
  • Perform data analysis and data wrangling using Python
  • Combine, group, and aggregate data from multiple sources
  • Create data visualizations with pandas, matplotlib, and seaborn
  • Apply machine learning algorithms to identify patterns and make predictions
  • Use Python data science libraries to analyze real-world datasets
  • Solve common data representation and analysis problems using pandas
  • Build Python scripts, modules, and packages for reusable analysis code

Who this book is for

This book is for data science beginners, data analysts, and Python developers who want to explore each stage of data analysis and scientific computing using a wide range of datasets. You'll also find this book useful if you are a data scientist looking to implement pandas in your machine learning workflow. Working knowledge of the Python programming language will assist with understanding the key concepts covered in this book; however, a Python crash-course tutorial is provided in the code bundle for anyone who needs a refresher.

Table of Contents

  1. Hands-On Data Analysis with Pandas Second Edition
  2. Foreword to the Second Edition
  3. Foreword to the First Edition
  4. Contributors
  5. About the author
  6. About the reviewer
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the color images
    5. Conventions used
    6. Get in touch
    7. Reviews
  8. Section 1: Getting Started with Pandas
  9. Chapter 1: Introduction to Data Analysis
    1. Chapter materials
    2. The fundamentals of data analysis
    3. Data collection
    4. Data wrangling
    5. Exploratory data analysis
    6. Drawing conclusions
    7. Statistical foundations
    8. Sampling
    9. Descriptive statistics
    10. Prediction and forecasting
    11. Inferential statistics
    12. Setting up a virtual environment
    13. Virtual environments
    14. Installing the required Python packages
    15. Why pandas?
    16. Jupyter Notebooks
    17. Summary
    18. Exercises
    19. Further reading
  10. Chapter 2: Working with Pandas DataFrames
    1. Chapter materials
    2. Pandas data structures
    3. Series
    4. Index
    5. DataFrame
    6. Creating a pandas DataFrame
    7. From a Python object
    8. From a file
    9. From a database
    10. From an API
    11. Inspecting a DataFrame object
    12. Examining the data
    13. Describing and summarizing the data
    14. Grabbing subsets of the data
    15. Selecting columns
    16. Slicing
    17. Indexing
    18. Filtering
    19. Adding and removing data
    20. Creating new data
    21. Deleting unwanted data
    22. Summary
    23. Exercises
    24. Further reading
  11. Section 2: Using Pandas for Data Analysis
  12. Chapter 3: Data Wrangling with Pandas
    1. Chapter materials
    2. Understanding data wrangling
    3. Data cleaning
    4. Data transformation
    5. Data enrichment
    6. Exploring an API to find and collect temperature data
    7. Cleaning data
    8. Renaming columns
    9. Type conversion
    10. Reordering, reindexing, and sorting data
    11. Reshaping data
    12. Transposing DataFrames
    13. Pivoting DataFrames
    14. Melting DataFrames
    15. Handling duplicate, missing, or invalid data
    16. Finding the problematic data
    17. Mitigating the issues
    18. Summary
    19. Exercises
    20. Further reading
  13. Chapter 4: Aggregating Pandas DataFrames
    1. Chapter materials
    2. Performing database-style operations on DataFrames
    3. Querying DataFrames
    4. Merging DataFrames
    5. Using DataFrame operations to enrich data
    6. Arithmetic and statistics
    7. Binning
    8. Applying functions
    9. Window calculations
    10. Pipes
    11. Aggregating data
    12. Summarizing DataFrames
    13. Aggregating by group
    14. Pivot tables and crosstabs
    15. Working with time series data
    16. Time-based selection and filtering
    17. Shifting for lagged data
    18. Differenced data
    19. Resampling
    20. Merging time series
    21. Summary
    22. Exercises
    23. Further reading
  14. Chapter 5: Visualizing Data with Pandas and Matplotlib
    1. Chapter materials
    2. An introduction to matplotlib
    3. The basics
    4. Plot components
    5. Additional options
    6. Plotting with pandas
    7. Evolution over time
    8. Relationships between variables
    9. Distributions
    10. Counts and frequencies
    11. The pandas.plotting module
    12. Scatter matrices
    13. Lag plots
    14. Autocorrelation plots
    15. Bootstrap plots
    16. Summary
    17. Exercises
    18. Further reading
  15. Chapter 6: Plotting with Seaborn and Customization Techniques
    1. Chapter materials
    2. Utilizing seaborn for advanced plotting
    3. Categorical data
    4. Correlations and heatmaps
    5. Regression plots
    6. Faceting
    7. Formatting plots with matplotlib
    8. Titles and labels
    9. Legends
    10. Formatting axes
    11. Customizing visualizations
    12. Adding reference lines
    13. Shading regions
    14. Annotations
    15. Colors
    16. Textures
    17. Summary
    18. Exercises
    19. Further reading
  16. Section 3: Applications – Real-World Analyses Using Pandas
  17. Chapter 7: Financial Analysis – Bitcoin and the Stock Market
    1. Chapter materials
    2. Building a Python package
    3. Package structure
    4. Overview of the stock_analysis package
    5. UML diagrams
    6. Collecting financial data
    7. The StockReader class
    8. Collecting historical data from Yahoo! Finance
    9. Exploratory data analysis
    10. The Visualizer class family
    11. Visualizing a stock
    12. Visualizing multiple assets
    13. Technical analysis of financial instruments
    14. The StockAnalyzer class
    15. The AssetGroupAnalyzer class
    16. Comparing assets
    17. Modeling performance using historical data
    18. The StockModeler class
    19. Time series decomposition
    20. ARIMA
    21. Linear regression with statsmodels
    22. Comparing models
    23. Summary
    24. Exercises
    25. Further reading
  18. Chapter 8: Rule-Based Anomaly Detection
    1. Chapter materials
    2. Simulating login attempts
    3. Assumptions
    4. The login_attempt_simulator package
    5. Simulating from the command line
    6. Exploratory data analysis
    7. Implementing rule-based anomaly detection
    8. Percent difference
    9. Tukey fence
    10. Z-score
    11. Evaluating performance
    12. Summary
    13. Exercises
    14. Further reading
  19. Section 4: Introduction to Machine Learning with Scikit-Learn
  20. Chapter 9: Getting Started with Machine Learning in Python
    1. Chapter materials
    2. Overview of the machine learning landscape
    3. Types of machine learning
    4. Common tasks
    5. Machine learning in Python
    6. Exploratory data analysis
    7. Red wine quality data
    8. White and red wine chemical properties data
    9. Planets and exoplanets data
    10. Preprocessing data
    11. Training and testing sets
    12. Scaling and centering data
    13. Encoding data
    14. Imputing
    15. Additional transformers
    16. Building data pipelines
    17. Clustering
    18. k-means
    19. Evaluating clustering results
    20. Regression
    21. Linear regression
    22. Evaluating regression results
    23. Classification
    24. Logistic regression
    25. Evaluating classification results
    26. Summary
    27. Exercises
    28. Further reading
  21. Chapter 10: Making Better Predictions – Optimizing Models
    1. Chapter materials
    2. Hyperparameter tuning with grid search
    3. Feature engineering
    4. Interaction terms and polynomial features
    5. Dimensionality reduction
    6. Feature unions
    7. Feature importances
    8. Ensemble methods
    9. Random forest
    10. Gradient boosting
    11. Voting
    12. Inspecting classification prediction confidence
    13. Addressing class imbalance
    14. Under-sampling
    15. Over-sampling
    16. Regularization
    17. Summary
    18. Exercises
    19. Further reading
  22. Chapter 11: Machine Learning Anomaly Detection
    1. Chapter materials
    2. Exploring the simulated login attempts data
    3. Utilizing unsupervised methods of anomaly detection
    4. Isolation forest
    5. Local outlier factor
    6. Comparing models
    7. Implementing supervised anomaly detection
    8. Baselining
    9. Logistic regression
    10. Incorporating a feedback loop with online learning
    11. Creating the PartialFitPipeline subclass
    12. Stochastic gradient descent classifier
    13. Summary
    14. Exercises
    15. Further reading
  23. Section 5: Additional Resources
  24. Chapter 12: The Road Ahead
    1. Data resources
    2. Python packages
    3. Searching for data
    4. APIs
    5. Websites
    6. Practicing working with data
    7. Python practice
    8. Summary
    9. Exercises
    10. Further reading
  25. Solutions
  26. Appendix
    1. Data analysis workflow
    2. Choosing the appropriate visualization
    3. Machine learning workflow
    4. Why subscribe?
  27. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Leave a review - let other readers know what you think
44.222.225.12