Humans are visual creatures and have evolved to be able to quickly notice the meaning when information is presented in certain ways that cause the wiring in our brains to have the light bulb of insight turn on. This "aha" can often be performed very quickly, given the correct tools, instead of through tedious numerical analysis.
Tools for data analysis, such as pandas, take advantage of being able to quickly and iteratively provide the user to take data, process it, and quickly visualize the meaning. Often, much of what you will do with pandas is massaging your data to be able to visualize it in one or more visual patterns, in an attempt to get to "aha" by simply glancing at the visual representation of the information.
This chapter will cover common patterns in visualizing data with pandas. It is not meant to be exhaustive in coverage. The goal is to give you the required knowledge to create beautiful data visualizations on pandas data quickly and with very few lines of code.
This chapter is presented in three sections. The first introduces you to the general concepts of programming visualizations with pandas, emphasizing the process of creating time-series charts. We will also dive into techniques to label axes and create legends, colors, line styles, and markets.
The second part of the chapter will then focus on the many types of data visualizations commonly used in pandas programs and data sciences, including:
The final section will briefly look at creating composite plots by dividing plots into subparts and drawing multiple plots within a single graphical canvas.
The first step to plot with pandas data, is to first include the appropriate libraries, primarily, matplotlib. The examples in this chapter will all be based on the following imports, where the plotting capabilities are from matplotlib, which will be aliased with plt
:
In [1]: # import pandas, numpy and datetime import numpy as np import pandas as pd # needed for representing dates and times import datetime from datetime import datetime # Set some pandas options for controlling output pd.set_option('display.notebook_repr_html', False) pd.set_option('display.max_columns', 10) pd.set_option('display.max_rows', 10) # used for seeding random number sequences seedval = 111111 # matplotlib import matplotlib as mpl # matplotlib plotting functions import matplotlib.pyplot as plt # we want our plots inline %matplotlib inline
The %matplotlib inline
line is the statement that tells matplotlib to produce inline graphics. This will make the resulting graphs appear either inside your IPython notebook or IPython session.
All examples will seed the random number generator with 111111
, so that the graphs remain the same every time they run, and so that the reader can reproduce the same charts as in the book.
3.133.130.199