How to do it...

  1. Read in the employee dataset, and create a column for years of experience:
>>> employee = pd.read_csv('data/employee.csv', 
parse_dates=['HIRE_DATE', 'JOB_DATE'])
>>> days_hired = pd.to_datetime('12-1-2016') - employee['HIRE_DATE']

>>> one_year = pd.Timedelta(1, unit='Y')
>>> employee['YEARS_EXPERIENCE'] = days_hired / one_year
>>> employee[['HIRE_DATE', 'YEARS_EXPERIENCE']].head()
  1. Let's create a basic scatter plot with a fitted regression line to represent the relationship between years of experience and salary:
>>> ax = sns.regplot(x='YEARS_EXPERIENCE', y='BASE_SALARY',
data=employee)
>>> ax.figure.set_size_inches(14,4)
  1. The regplot function cannot plot multiple regression lines for different levels of a third variable. Let's use its parent function, lmplot, to plot a seaborn Grid that adds the same regression lines for males and females:
>>> g = sns.lmplot('YEARS_EXPERIENCE', 'BASE_SALARY',
hue='GENDER', palette='Greys',
scatter_kws={'s':10}, data=employee)
>>> g.fig.set_size_inches(14, 4)
>>> type(g)
seaborn.axisgrid.FacetGrid
  1. The real power of the seaborn Grid functions is their ability to add more Axes based on another variable. Each seaborn Grid has the col and row parameters available to divide the data further into different groups. For instance, we can create a separate plot for each unique race in the dataset and still fit the regression lines by gender:
>>> grid = sns.lmplot(x='YEARS_EXPERIENCE', y='BASE_SALARY',
hue='GENDER', col='RACE', col_wrap=3,
palette='Greys', sharex=False,
line_kws = {'linewidth':5},
data=employee)
>>> grid.set(ylim=(20000, 120000))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.180.43