How it works...

The read_csv function in step 1 allows to both convert columns into Timestamps and put them in the index at the same time creating a DatetimeIndex. Steps 2 does a simple groupby operation with a single grouping column, gender. Step 3 uses the resample method with the offset alias 10AS to form groups in 10-year increments of time. The A is the alias for year and the S informs us that the beginning of the period is used as the label. For instance, the data for the label 1988-01-01 spans that date until December 31, 1997.

Interestingly, the object returned from a call to the groupby method has its own resample method, but the reverse is not true:

>>> 'resample' in dir(employee.groupby('GENDER'))
True

>>> 'groupby' in dir(employee.resample('10AS'))
False

In step 4, for each gender, male and female, completely different starting dates for the 10-year periods are calculated based on the earliest hired employee. Step 6 verifies that the year of the earliest hired employee for each gender matches the output from step 4. Step 5 shows how this causes misalignment when we try to compare salaries of females to males. They don't have the same 10-year periods.

To alleviate this issue, we must group both the gender and Timestamp together. The resample method is only capable of grouping by a single column of Timestamps. We can only complete this operation with the groupby method. With pd.Grouper, we can replicate the functionality of resample. We simply pass the offset alias to the freq parameter and then place the object in a list with all the other columns that we wish to group, as done in step 7. As both males and females now have the same starting dates for the 10-year period, the reshaped data in step 8 will align for each gender making comparisons much easier. It appears that male salaries tend to be higher given a longer length of employment, though both genders have the same average salary with under 10 years of employment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.59