How to do it...

  1. Read in the employee dataset, and output the first five rows:
>>> employee = pd.read_csv('data/employee.csv', 
parse_dates=['HIRE_DATE', 'JOB_DATE'])
>>> employee.head()
  1. Import the seaborn library, and alias it sns:
>>> import seaborn as sns
  1. Let's make a bar chart of the count of each department with seaborn:
>>> sns.countplot(y='DEPARTMENT', data=employee)
  1. To reproduce this plot with pandas, we will need to aggregate the data beforehand:
>>> employee['DEPARTMENT'].value_counts().plot('barh')
  1. Now, let's find the average salary for each race with seaborn:
>>> ax = sns.barplot(x='RACE', y='BASE_SALARY', data=employee)
>>> ax.figure.set_size_inches(16, 4)
  1. To replicate this with pandas, we will need to group by each race first:
>>> avg_sal = employee.groupby('RACE', sort=False) 
['BASE_SALARY'].mean()
>>> ax = avg_sal.plot(kind='bar', rot=0, figsize=(16,4), width=.8)
>>> ax.set_xlim(-.5, 5.5)
>>> ax.set_ylabel('Mean Salary')
  1. Seaborn also has the ability to distinguish groups within the data through a third variable, hue, in most of its plotting functions. Let's find the mean salary by race and gender:
>>> ax = sns.barplot(x='RACE', y='BASE_SALARY', hue='GENDER', 
data=employee, palette='Greys')
>>> ax.figure.set_size_inches(16,4)
  1. With pandas, we will have to group by both race and gender and then unstack the genders as column names:
>>> employee.groupby(['RACE', 'GENDER'], sort=False) 
['BASE_SALARY'].mean().unstack('GENDER')
.plot(kind='bar', figsize=(16,4), rot=0,
width=.8, cmap='Greys')
  1. A box plot is another type of plot that seaborn and pandas have in common. Let's create a box plot of salary by race and gender with seaborn:
>>> sns.boxplot(x='GENDER', y='BASE_SALARY', data=employee,
hue='RACE', palette='Greys')
>>> ax.figure.set_size_inches(14,4)
  1. Pandas is not easily able to produce an exact replication for this box plot. It can create two separate Axes for gender and then make box plots of the salary by race:
>>> fig, ax_array = plt.subplots(1, 2, figsize=(14,4), sharey=True)
>>> for g, ax in zip(['Female', 'Male'], ax_array):
employee.query('GENDER== @g')
.boxplot(by='RACE', column='BASE_SALARY',
ax=ax, rot=20)
ax.set_title(g + ' Salary')
ax.set_xlabel('')
>>> fig.suptitle('')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.72.176