How it works...

The weightlifting dataset, like many datasets, has easily digestible information in its raw form, but technically, it is messy, as all but one of the column names contain information for sex and age. Once the variables are identified, we can begin to tidy the dataset. Whenever column names contain variables, you will need to use the melt (or stack) method. The Weight Category variable is already in the correct position so we keep it as an identifying variable by passing it to the id_vars parameter. Note that we don't explicitly need to name all the columns that we are melting with value_vars. By default, all the columns not present in id_vars get melted.

The sex_age column needs to be parsed, and split into two variables. For this, we turn to the extra functionality provided by the str accessor, only available to Series (a single DataFrame column). The split method is one of the more common methods in this situation, as it can separate different parts of the string into their own column. By default, it splits on an empty space, but you may also specify a string or regular expression with the pat parameter. When the expand parameter is set to True, a new column forms for each independent split character segment. When False, a single column is returned, containing a list of all the segments.

After renaming the columns in step 4, we need to use the str accessor again. Interestingly enough, the indexing operator is available to select or slice segments of a string. Here, we select the first character, which is the variable for sex. We could go further and split the ages into two separate columns for minimum and maximum age, but it is common to refer to the entire age group in this manner, so we leave it as is.

Step 6 shows one of two different methods to join all the data together. The concat function accepts a collection of DataFrames and either concatenates them vertically (axis='index') or horizontally (axis='columns'). Because the two DataFrames are indexed identically, it is possible to assign the values of one DataFrame to new columns in the other as done in step 7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.24.30