3. Plotting Basics

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3 Plotting Basics

Data visualization is as much a part of the data processing step as the data presentation step. It is much easier to compare plotted values than to compare numerical values. By visualizing data we can get a better intuitive sense of the data than would be possible by looking at tables of values alone. Additionally, visualizations can bring to light hidden patterns in data, that you, the analyst, can use for model selection.

Learning Objectives

The concept map for this chapter can be found in Figure A.3.

Explain why visualizing data is important
Create various statistical plots for exploratory data analysis
Use plotting functions from the matplotlib, seaborn, and pandas libraries
Identify when to use univariate, bivariate, and multivariate plots
Use different color palettes to make plots more accessible

3.1 Why Visualize Data?

The quintessential example for creating visualizations of data is Anscombe’s quartet. This data set was created by English statistician Frank Anscombe to show the importance of statistical graphs.

The Anscombe data set contains four sets of data, each of which contains two continuous variables. Each set has the same mean, variance, correlation, and regression line. However, only when the data are visualized does it become obvious that each set does not follow the same pattern. This goes to show the benefits of visualizations and the pitfalls of looking at only summary statistics.

Table of Contents for 3. Plotting Basics

Create new playlist

Sign In

Sign Up

3

Plotting Basics

Learning Objectives

3.1 Why Visualize Data?

3.2 Matplotlib Basics

3.2.1 Figure Objects and Axes Subplots

3.2.2 Anatomy of a Figure

3.3 Statistical Graphics Using matplotlib

3.3.1 Univariate (Single Variable)

3.3.1.1 Histograms

3.3.2 Bivariate (Two Variables)

3.3.2.1 Scatter Plot

3.3.2.2 Box Plot

3.3.3 Multivariate Data

3.4 Seaborn

3.4.1 Univariate

3.4.1.1 Histogram

3.4.1.2 Density Plot (Kernel Density Estimation)

3.4.1.3 Rug Plot

3.4.1.4 Distribution Plots

3.4.1.5 Count Plot (Bar Plot)

3.4.2 Bivariate Data

3.4.2.1 Scatter Plot

3.4.2.2 Joint Plot

3.4.2.3 Hexbin Plot

3.4.2.4 2D Density Plot

3.4.2.5 Bar Plot

3.4.2.6 Box Plot

3.4.2.7 Violin Plot

3.4.2.8 Pairwise Relationships

3.4.3 Multivariate Data

3.4.3.1 Colors

3.4.3.2 Size and Shape

3.4.4 Facets

3.4.4.1 One Facet Variable

3.4.4.2 Two Facet Variables

3.4.4.3 Manually Create Facets

3.4.5 Seaborn Styles and Themes

3.4.5.1 Styles

3.4.5.2 Plotting Contexts

3.4.6 How to Go Through Seaborn Documentation

3.4.6.1 Matplotlib Axes Objects

3.4.6.2 Matplotlib Figure Objects

3.4.6.3 Custom Seaborn Objects

3.4.7 Next-Generation Seaborn Interface

3.5 Pandas Plotting Method

3.5.1 Histogram

3.5.2 Density Plot

3.5.3 Scatter Plot

3.5.4 Hexbin Plot

3.5.5 Box Plot

Conclusion

Table of Contents for
3. Plotting Basics

3.3 Statistical Graphics Using `matplotlib`