Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

How it works...

How to do it...

Read in the flights dataset, and define the grouping columns (AIRLINE), aggregating columns (ARR_DELAY), and aggregating functions (mean):

>>> flights = pd.read_csv('data/flights.csv')
>>> flights.head()

Place the grouping column in the groupby method and then call the agg method with a dictionary pairing the aggregating column with its aggregating function:

>>> flights.groupby('AIRLINE').agg({'ARR_DELAY':'mean'}).head()

Alternatively, you may place the aggregating column in the indexing operator and then pass the aggregating function as a string to agg:

>>> flights.groupby('AIRLINE')['ARR_DELAY'].agg('mean').head()
AIRLINE
AA     5.542661
AS    -0.833333
B6     8.692593
DL     0.339691
EV     7.034580
Name: ARR_DELAY, dtype: float64

The string names used in the previous step are a convenience pandas offers you to refer to a particular aggregation function. You can pass any aggregating function directly to the agg method such as the NumPy mean function. The output is the same as the previous step:

>>> flights.groupby('AIRLINE')['ARR_DELAY'].agg(np.mean).head()

It's possible to skip the agg method altogether in this case and use the mean method directly. This output is also the same as step 3:

>>> flights.groupby('AIRLINE')['ARR_DELAY'].mean().head()

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.143.247.125