How to do it...

Read in the employee dataset as a DataFrame:

>>> employee = pd.read_csv('data/employee.csv')

Before filtering out the data, it is helpful to do some manual inspection of each of the filtered columns to know the exact values that will be used in the filter:

>>> employee.DEPARTMENT.value_counts().head()
Houston Police Department-HPD     638
Houston Fire Department (HFD)     384
Public Works & Engineering-PWE    343
Health & Human Services           110
Houston Airport System (HAS)      106
Name: DEPARTMENT, dtype: int64

>>> employee.GENDER.value_counts()
 Male 1397
 Female 603

>>> employee.BASE_SALARY.describe().astype(int)
count      1886
mean      55767
std       21693
min       24960
25%       40170
50%       54461
75%       66614
max      275000
Name: BASE_SALARY, dtype: int64

Write a single statement for each of the criteria. Use the isin method to test equality to one of many values:

>>> depts = ['Houston Police Department-HPD', 
             'Houston Fire Department (HFD)']
>>> criteria_dept = employee.DEPARTMENT.isin(depts)
>>> criteria_gender = employee.GENDER == 'Female'
>>> criteria_sal = (employee.BASE_SALARY >= 80000) & 
                   (employee.BASE_SALARY <= 120000)

Combine all the boolean Series together:

>>> criteria_final = (criteria_dept & 
                      criteria_gender & 
                      criteria_sal)

Use boolean indexing to select only the rows that meet the final criteria:

>>> select_columns = ['UNIQUE_ID', 'DEPARTMENT',
                     'GENDER', 'BASE_SALARY']
>>> employee.loc[criteria_final, select_columns].head()

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...