Chapter 11. Visualization

Humans are visual creatures and have evolved to be able to quickly notice the meaning when information is presented in certain ways that cause the wiring in our brains to have the light bulb of insight turn on. This "aha" can often be performed very quickly, given the correct tools, instead of through tedious numerical analysis.

Tools for data analysis, such as pandas, take advantage of being able to quickly and iteratively provide the user to take data, process it, and quickly visualize the meaning. Often, much of what you will do with pandas is massaging your data to be able to visualize it in one or more visual patterns, in an attempt to get to "aha" by simply glancing at the visual representation of the information.

This chapter will cover common patterns in visualizing data with pandas. It is not meant to be exhaustive in coverage. The goal is to give you the required knowledge to create beautiful data visualizations on pandas data quickly and with very few lines of code.

This chapter is presented in three sections. The first introduces you to the general concepts of programming visualizations with pandas, emphasizing the process of creating time-series charts. We will also dive into techniques to label axes and create legends, colors, line styles, and markets.

The second part of the chapter will then focus on the many types of data visualizations commonly used in pandas programs and data sciences, including:

  • Bar plots
  • Histograms
  • Box and whisker charts
  • Area plots
  • Scatter plots
  • Density plots
  • Scatter plot matrixes
  • Heatmaps

The final section will briefly look at creating composite plots by dividing plots into subparts and drawing multiple plots within a single graphical canvas.

Setting up the IPython notebook

The first step to plot with pandas data, is to first include the appropriate libraries, primarily, matplotlib. The examples in this chapter will all be based on the following imports, where the plotting capabilities are from matplotlib, which will be aliased with plt:

In [1]:
   # import pandas, numpy and datetime
   import numpy as np
   import pandas as pd

   # needed for representing dates and times
   import datetime 
   from datetime import datetime

   # Set some pandas options for controlling output
   pd.set_option('display.notebook_repr_html', False)
   pd.set_option('display.max_columns', 10)
   pd.set_option('display.max_rows', 10)

   # used for seeding random number sequences
   seedval = 111111

   # matplotlib 
   import matplotlib as mpl
   # matplotlib plotting functions
   import matplotlib.pyplot as plt
   # we want our plots inline
   %matplotlib inline

The %matplotlib inline line is the statement that tells matplotlib to produce inline graphics. This will make the resulting graphs appear either inside your IPython notebook or IPython session.

All examples will seed the random number generator with 111111, so that the graphs remain the same every time they run, and so that the reader can reproduce the same charts as in the book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.130.199