Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

How it works...

How to do it...

Read in the employee dataset, and create a column for years of experience:

>>> employee = pd.read_csv('data/employee.csv', 
                       parse_dates=['HIRE_DATE', 'JOB_DATE'])
>>> days_hired = pd.to_datetime('12-1-2016') - employee['HIRE_DATE']

>>> one_year = pd.Timedelta(1, unit='Y')
>>> employee['YEARS_EXPERIENCE'] = days_hired / one_year
>>> employee[['HIRE_DATE', 'YEARS_EXPERIENCE']].head()

Let's create a basic scatter plot with a fitted regression line to represent the relationship between years of experience and salary:

>>> ax = sns.regplot(x='YEARS_EXPERIENCE', y='BASE_SALARY',
                     data=employee)
>>> ax.figure.set_size_inches(14,4)

The regplot function cannot plot multiple regression lines for different levels of a third variable. Let's use its parent function, lmplot, to plot a seaborn Grid that adds the same regression lines for males and females:

>>> g = sns.lmplot('YEARS_EXPERIENCE', 'BASE_SALARY',
                    hue='GENDER', palette='Greys',
                    scatter_kws={'s':10}, data=employee)
>>> g.fig.set_size_inches(14, 4)
>>> type(g)
seaborn.axisgrid.FacetGrid

The real power of the seaborn Grid functions is their ability to add more Axes based on another variable. Each seaborn Grid has the col and row parameters available to divide the data further into different groups. For instance, we can create a separate plot for each unique race in the dataset and still fit the regression lines by gender:

>>> grid = sns.lmplot(x='YEARS_EXPERIENCE', y='BASE_SALARY',
                      hue='GENDER', col='RACE', col_wrap=3,
                      palette='Greys', sharex=False,
                      line_kws = {'linewidth':5},
                      data=employee)
>>> grid.set(ylim=(20000, 120000))

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.189.180.43