How to do it...

  1. Read in the employee dataset as a DataFrame:
>>> employee = pd.read_csv('data/employee.csv')
  1. Before filtering out the data, it is helpful to do some manual inspection of each of the filtered columns to know the exact values that will be used in the filter:
>>> employee.DEPARTMENT.value_counts().head()
Houston Police Department-HPD 638 Houston Fire Department (HFD) 384 Public Works & Engineering-PWE 343 Health & Human Services 110 Houston Airport System (HAS) 106 Name: DEPARTMENT, dtype: int64

>>> employee.GENDER.value_counts()
Male 1397
Female 603

>>> employee.BASE_SALARY.describe().astype(int)
count 1886 mean 55767 std 21693 min 24960 25% 40170 50% 54461 75% 66614 max 275000 Name: BASE_SALARY, dtype: int64
  1. Write a single statement for each of the criteria. Use the isin method to test equality to one of many values:
>>> depts = ['Houston Police Department-HPD', 
'Houston Fire Department (HFD)']
>>> criteria_dept = employee.DEPARTMENT.isin(depts)
>>> criteria_gender = employee.GENDER == 'Female'
>>> criteria_sal = (employee.BASE_SALARY >= 80000) &
(employee.BASE_SALARY <= 120000)
  1. Combine all the boolean Series together:
>>> criteria_final = (criteria_dept & 
criteria_gender &
criteria_sal)
  1. Use boolean indexing to select only the rows that meet the final criteria:
>>> select_columns = ['UNIQUE_ID', 'DEPARTMENT',
'GENDER', 'BASE_SALARY']
>>> employee.loc[criteria_final, select_columns].head()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.19.174