How it works...

Throughout this recipe, the query method is used to filter data instead of boolean indexing. Refer to the Improving readability of Boolean indexing with the query method recipe from Chapter 5, Boolean Indexing, for more information.

Our goal is to find the percentage weight loss for each month for each person. One way to accomplish this task is to calculate each week's weight loss relative to the start of each month. This specific task is perfectly suited to the transform groupby method. The transform method accepts a function as its one required parameter. This function gets implicitly passed each non-grouping column (or only the columns specified in the indexing operator as was done in this recipe with Weight). It must return a sequence of values the same length as the passed group or else an exception will be raised. In essence, all values from the original DataFrame are transforming. No aggregation or filtration takes place.

Step 2 creates a function that subtracts the first value of the passed Series from all of its values and then divides this result by the first value. This calculates the percent loss (or gain) relative to the first value. In step 3 we test this function on one person during one month.

In step 4, we use this function in the same manner over every combination of person and week. In some literal sense, we are transforming the Weight column into the percentage of weight lost for the current week. The first month of data is outputted for each person. Pandas returns the new data as a Series. This Series isn't all that useful by itself and makes more sense appended to the original DataFrame as a new column. We complete this operation in step 5.

To determine the winner, only week 4 of each month is necessary. We could stop here and manually determine the winner but pandas supplies us functionality to automate this. The pivot function in step 7 reshapes our dataset by pivoting the unique values of one column into new column names. The index parameter is used for the column that you do not want to pivot. The column passed to the values parameter gets tiled over each unique combination of the columns in the index and columns parameters.

The pivot method only works if there is just a single occurrence of each unique combination of the columns in the index and columns parameters. If there is more than one unique combination, an exception will be raised. You can use the pivot_table method in that situation which allows you to aggregate multiple values together.

After pivoting, we utilize the highly effective and fast NumPy where function, whose first argument is a condition that produces a Series of booleans. True values get mapped to Amy and False values get mapped to Bob. We highlight the winner of each month and tally the final score with the value_counts method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.108