Defining plot types – bar, line, and stacked charts

In this recipe, we will present different basic plots and what are they used for. Most of the plots described here are used daily, and some of them present the basis for understanding more advanced concepts in data visualization.

Getting ready

We start with some common charts from the matplotlib.pyplot library with just sample datasets; we start with basic charting and lay down the foundations of the following recipes.

How to do it...

We start by creating a simple plot in IPython. IPython is great because it allows us to interactively change plots and see the results immediately. You need to follow these steps for that:

  1. Start IPython by typing the following code at the command prompt:
    $ ipython
    
  2. Import the necessary functions:
    In [1]: from matplotlib.pyplot import *
    
  3. Then type the matplotlib plot code:
    In [2]: plot([1,2,3,2,3,2,2,1])
    Out[2]: [<matplotlib.lines.Line2D at 0x412fb50>]

The plot should open in a new window displaying the default look of the plot and some supporting information as shown here:

How to do it...

The basic plot in matplotlib contains the following elements:

  • x and y axes: These are both horizontal and vertical axes.
  • x and y tickers: These are little tickers denoting the segments of axes. There can be major and minor tickers.
  • x and y tick labels: These represent values on particular axis.
  • Plotting area: This is where the actual plots are drawn.

You will notice that the values we provided to plot() as y axis values. plot()provides default values for the x axis; they are linear values from 0 to 7 (the number of y values -1).

Now, try adding values for the x axis; as first argument to the plot() function again in the same IPython session, type the following script:

In [2]: plot([4,3,2,1],[1,2,3,4])
Out[2]: [<matplotlib.lines.Line2D at 0x31444d0>]

Tip

Note how IPython counts input and output lines (In [2] and Out [2]). This will help us remember where we are in the current session and enables more advanced features such as saving part of the session in a Python file. During data analysis, using IPython for prototyping is the fastest way to come to a satisfying solution and then save particular sessions into a file, to be executed later if you need to reproduce the same plot.

This will update the plot to look like this image:

How to do it...

We see here how matplotlib expands the y axis to accommodate the new value range and automatically changes color of the second plot line to enable us to distinguish the new plot.

Unless we turn off the hold property (by calling hold(False)), all subsequent plots will draw over the same axes. This is the default behavior in pylab mode in IPython, while in regular Python scripts, hold is off by default.

Let us pack some more common plots and compare them over the same dataset. You can type this in IPython or run it from a separate Python script:

from matplotlib.pyplot import *

# some simple data
x = [1,2,3,4]
y = [5,4,3,2]
# create new figure
figure()

# divide subplots into 2 x 3 grid
# and select #1
subplot(231)
plot(x, y)

# select #2
subplot(232)
bar(x, y)

   # horizontal bar-charts
subplot(233)
barh(x, y)

# create stacked bar charts
subplot(234)
bar(x, y)

# we need more data for stacked bar charts
y1 = [7,8,5,3]
bar(x, y1, bottom=y, color = 'r')

# box plot
subplot(235)
boxplot(x)

# scatter plot
subplot(236)
scatter(x,y)

show()

This is how it should turn out into graphs:

How to do it...

How it works...

With figure(), we create a new figure. If we supply a string argument such as sample charts, it will be the backend title of a window. If we call the figure() function with the same parameter (that can also be a number), we will make the corresponding figure active and all the following plotting will be performed on that figure.

Next, we divide the figure into a 2 x 3 grid using a subplot(231) call. We could call this using subplot(2, 3, 1), where the first parameter is the number of rows, the second is the number of columns, and the third represents the plot number.

We continue and create a common charting type using simple calls to create vertical bar charts (bar()) and horizontal bars (barh()). For stacked bar charts, we need to tie two bar chart calls together. We do that by connecting the second bar chart with the previous using the parameter bottom = y.

Box plots are created using the boxplot() call, where the box extends from lower to upper quartiles with the line at the median value. We will return to box plots shortly.

We finally create a scatter plot to give you an idea of a point-based dataset. This is probably more appropriately used when we have thousands of data points in a dataset, but here we wanted to illustrate the difference in representations of the same dataset.

There's more...

We can return to box plots now as we need to explain the characteristics of this kind of plot.

A box plot presents, by default, the following elements:

  • Box: This is a rectangle that covers the interquartile range
  • Median: This is presented as a line inside each box
  • Whiskers: These are vertical lines extending to the most extreme values (excluding outliers)
  • Fliers: These are points beyond the whiskers, which are considered outliers

To illustrate this behavior, we will demonstrate plotting the same dataset in a box plot and a histogram as shown in the following code:

from pylab import *

dataset = [113, 115, 119, 121, 124,
           124, 125, 126, 126, 126,
           127, 127, 128, 129, 130,
           130, 131, 132, 133, 136]


subplot(121)
boxplot(dataset, vert=False)

subplot(122)
hist(dataset)

show()

That will give us the following plots:

There's more...

In the preceding comparison, we can observe a difference in representation of the same dataset in two different charts. The one on the left points toward the five mentioned statistical values, while the one on the right (the histogram) displays the grouping of the dataset in a given range.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.115.155