How to do it...

  1. Read in the raw weight_loss dataset, and examine the first month of data from the two people, Amy and Bob. There are a total of four weigh-ins per month:
>>> weight_loss = pd.read_csv('data/weight_loss.csv')
>>> weight_loss.query('Month == "Jan"')
  1. To determine the winner for each month, we only need to compare weight loss from the first week to the last week of each month. But, if we wanted to have weekly updates, we can also calculate weight loss from the current week to the first week of each month.  Let's create a function that is capable of providing weekly updates:
>>> def find_perc_loss(s):
return (s - s.iloc[0]) / s.iloc[0]
  1. Let's test out this function for Bob during the month of January.
>>> bob_jan = weight_loss.query('Name=="Bob" and Month=="Jan"')
>>> find_perc_loss(bob_jan['Weight'])
0 0.000000 2 -0.010309 4 -0.027491 6 -0.027491 Name: Weight, dtype: float64
You should ignore the index values in the last output. 0, 2, 4 and 6 simply refer to the original row labels of the DataFrame and have no relation to the week.
  1. After the first week, Bob lost 1% of his body weight. He continued losing weight during the second week but made no progress during the last week.  We can apply this function to every single combination of person and week to get the weight loss per week in relation to the first week of the month. To do this, we need to group our data by Name and Month , and then use the transform method to apply this custom function:
>>> pcnt_loss = weight_loss.groupby(['Name', 'Month'])['Weight'] 
.transform(find_perc_loss)
>>> pcnt_loss.head(8)
0 0.000000 1 0.000000 2 -0.010309 3 -0.040609 4 -0.027491 5 -0.040609 6 -0.027491 7 -0.035533 Name: Weight, dtype: float64
  1. The transform method must return an object with the same number of rows as the calling DataFrame. Let's append this result to our original DataFrame as a new column. To help shorten the output, we will select Bob's first two months of data:
>>> weight_loss['Perc Weight Loss'] = pcnt_loss.round(3)
>>> weight_loss.query('Name=="Bob" and Month in ["Jan", "Feb"]')
  1. Notice that the percentage weight loss resets after the new month. With this new column, we can manually determine a winner but let's see if we can find a way to do this automatically. As the only week that matters is the last week, let's select week 4:
>>> week4 = weight_loss.query('Week == "Week 4"')
>>> week4
  1. This narrows down the weeks but still doesn't automatically find out the winner of each month. Let's reshape this data with the pivot method so that Bob's and Amy's percent weight loss is side-by-side for each month:
>>> winner = week4.pivot(index='Month', columns='Name',
values='Perc Weight Loss')
>>> winner
  1. This output makes it clearer who has won each month, but we can still go a couple steps farther. NumPy has a vectorized if-then-else function called where, which can map a Series or array of booleans to other values. Let's create a column for the name of the winner and highlight the winning percentage for each month:
>>> winner['Winner'] = np.where(winner['Amy'] < winner['Bob'],
'Amy', 'Bob')
>>> winner.style.highlight_min(axis=1)
  1. Use the value_counts method to return the final score as the number of months won:
>>> winner.Winner.value_counts()
Amy 3 Bob 1 Name: Winner, dtype: int64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.196.244