How to do it...

Read in the raw weight_loss dataset, and examine the first month of data from the two people, Amy and Bob. There are a total of four weigh-ins per month:

>>> weight_loss = pd.read_csv('data/weight_loss.csv')
>>> weight_loss.query('Month == "Jan"')

To determine the winner for each month, we only need to compare weight loss from the first week to the last week of each month. But, if we wanted to have weekly updates, we can also calculate weight loss from the current week to the first week of each month. Let's create a function that is capable of providing weekly updates:

>>> def find_perc_loss(s):
        return (s - s.iloc[0]) / s.iloc[0]

Let's test out this function for Bob during the month of January.

>>> bob_jan = weight_loss.query('Name=="Bob" and Month=="Jan"')
>>> find_perc_loss(bob_jan['Weight'])
0    0.000000
2   -0.010309
4   -0.027491
6   -0.027491
Name: Weight, dtype: float64

You should ignore the index values in the last output. 0, 2, 4 and 6 simply refer to the original row labels of the DataFrame and have no relation to the week.

After the first week, Bob lost 1% of his body weight. He continued losing weight during the second week but made no progress during the last week. We can apply this function to every single combination of person and week to get the weight loss per week in relation to the first week of the month. To do this, we need to group our data by Name and Month , and then use the transform method to apply this custom function:

>>> pcnt_loss = weight_loss.groupby(['Name', 'Month'])['Weight'] 
                           .transform(find_perc_loss)
>>> pcnt_loss.head(8)
0    0.000000
1    0.000000
2   -0.010309
3   -0.040609
4   -0.027491
5   -0.040609
6   -0.027491
7   -0.035533
Name: Weight, dtype: float64

The transform method must return an object with the same number of rows as the calling DataFrame. Let's append this result to our original DataFrame as a new column. To help shorten the output, we will select Bob's first two months of data:

>>> weight_loss['Perc Weight Loss'] = pcnt_loss.round(3)
>>> weight_loss.query('Name=="Bob" and Month in ["Jan", "Feb"]')

Notice that the percentage weight loss resets after the new month. With this new column, we can manually determine a winner but let's see if we can find a way to do this automatically. As the only week that matters is the last week, let's select week 4:

>>> week4 = weight_loss.query('Week == "Week 4"')
>>> week4

This narrows down the weeks but still doesn't automatically find out the winner of each month. Let's reshape this data with the pivot method so that Bob's and Amy's percent weight loss is side-by-side for each month:

>>> winner = week4.pivot(index='Month', columns='Name',
                         values='Perc Weight Loss')
>>> winner

This output makes it clearer who has won each month, but we can still go a couple steps farther. NumPy has a vectorized if-then-else function called where, which can map a Series or array of booleans to other values. Let's create a column for the name of the winner and highlight the winning percentage for each month:

>>> winner['Winner'] = np.where(winner['Amy'] < winner['Bob'],
                                'Amy', 'Bob')
>>> winner.style.highlight_min(axis=1)

Use the value_counts method to return the final score as the number of months won:

>>> winner.Winner.value_counts()
Amy    3
Bob    1
Name: Winner, dtype: int64

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...