How to do it...

  1. First, take note that the state names are in the index of the DataFrame. These states are correctly placed vertically and do not need to be restructured. It is the column names that are the problem. The stack method takes all of the column names and reshapes them to be vertical as a single index level:
>>> state_fruit.stack()
Texas Apple 12 Orange 10 Banana 40 Arizona Apple 9 Orange 7 Banana 12 Florida Apple 0 Orange 14 Banana 190 dtype: int64
  1. Notice that we now have a Series with a MultiIndex. There are now two levels in the index. The original index has been pushed to the left to make room for the old column names. With this one command, we now essentially have tidy data. Each variable, state, fruit, and weight is vertical. Let's use the reset_index method to turn the result into a DataFrame:
>>> state_fruit_tidy = state_fruit.stack().reset_index()
>>> state_fruit_tidy
  1. Our structure is now correct, but the column names are meaningless. Let's replace them with proper identifiers:
>>> state_fruit_tidy.columns = ['state', 'fruit', 'weight']
>>> state_fruit_tidy
  1. Instead of directly changing the columns attribute, it's possible to use the lesser-known Series method rename_axis to set the names of the index levels before using reset_index:
>>> state_fruit.stack()
.rename_axis(['state', 'fruit'])

state fruit
Texas Apple 12 Orange 10 Banana 40 Arizona Apple 9 Orange 7 Banana 12 Florida Apple 0 Orange 14 Banana 190 dtype: int64
  1. From here, we can simply chain the reset_index method with the name parameter to reproduce the output from step 3:
>>> state_fruit.stack()
.rename_axis(['state', 'fruit'])
.reset_index(name='weight')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.217.253