Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9. Grouping and Aggregating Data

The pandas library provides a flexible and high-performance "groupby" facility that enables you to slice, dice, and summarize data sets. This process follows a pattern known as split-apply-combine. This pattern data is first categorized into groups based on a criteria such as the indexes or values within the columns. Each group is then processed with an aggregation or transformation function, returning a set of data with transformed values or a single aggregate summary for each group. pandas then combines all of these results and presents it in a single data structure.

We will start by seeing how pandas is used to split data. This will start with a demonstration of how to group data both using categorical values in the columns of a DataFrame object or using the levels in the index of a pandas object. Using the result from a grouping operation, we will examine how to access the data in each group, as well as retrieve various basic statistical values of the groups.

The next section will focus on the apply portion of the pattern. This involves providing summaries of the groups via aggregation functions, transforming each row in a group into a new series of data, and removing groups of data based upon various criteria to prevent it from being in the results.

The chapter will close with a look at performing discretization of data in pandas. Although not properly a grouping function of pandas, discretization allows for data to be grouped into buckets, based upon ranges of values or to evenly distribute data across a number of buckets.

Specifically, in this chapter, we will cover:

An overview of the split, apply, and combine pattern for data analysis
Grouping by column values
Accessing the results of grouping
Grouping using index levels
Applying functions to groups to create aggregate results
Transforming groups of data using filtering to selectively remove groups of data
The discretization of continuous data into bins

Setting up the IPython notebook

To utilize the examples in this chapter, we will need to include the following imports and settings:

In [1]:
   # import pandas and numpy
   import numpy as np
   import pandas as pd

   # Set some pandas options for controlling output
   pd.set_option('display.notebook_repr_html', False)
   pd.set_option('display.max_columns', 10)
   pd.set_option('display.max_rows', 10)

   # inline graphics
   %matplotlib inline

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9. Grouping and Aggregating Data

Create new playlist

Sign In

Sign Up

Chapter 9. Grouping and Aggregating Data

Setting up the IPython notebook

Table of Contents for
9. Grouping and Aggregating Data