How to do it...

  1. Read in the men's weightlifting dataset, and identify the variables:
>>> weightlifting = pd.read_csv('data/weightlifting_men.csv')
>>> weightlifting
  1. The variables are the weight category, sex/age category, and the qualifying total. The age and sex variables have been concatenated together into a single cell. Before we can separate them, let's use the melt method to transpose the age and sex column names into a single vertical column:
>>> wl_melt = weightlifting.melt(id_vars='Weight Category', 
var_name='sex_age',
value_name='Qual Total')
>>> wl_melt.head()
  1. Select the sex_age column, and use the split method available from the str accessor to split the column into two different columns:
>>> sex_age = wl_melt['sex_age'].str.split(expand=True)
>>> sex_age.head()
  1. This operation returned a completely separate DataFrame with meaningless column names. Let's rename the columns so that we can explicitly access them:
>>> sex_age.columns = ['Sex', 'Age Group']
>>> sex_age.head()
  1. Use the indexing operator directly after the str accessor to select the first character from the Sex column:
>>> sex_age['Sex'] = sex_age['Sex'].str[0]
>>> sex_age.head()
  1. Use the pd.concat function to concatenate this DataFrame with wl_melt to produce a tidy dataset:
>>> wl_cat_total = wl_melt[['Weight Category', 'Qual Total']]
>>> wl_tidy = pd.concat([sex_age, wl_cat_total], axis='columns')
>>> wl_tidy.head()
  1. This same result could have been created with the following:
>>> cols = ['Weight Category', 'Qual Total']
>>> sex_age[cols] = wl_melt[cols]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.118.229