© Valentina Porcu 2018
Valentina PorcuPython for Data Mining Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4113-4_10

10. Matplotlib

Valentina Porcu1 
(1)
Nuoro, Italy
 

Creating graphs is an important step in exploratory analysis, and one of the first stages in data analysis. We can use Matplotlib to construct a variety of analytical graphs that display our data in different ways.

Basic Plots

To represent our data graphically, we can use the Matplotlib library, one of the most used graphics representation software packages. Its documentation can be found at https://matplotlib.org . The section gallery of the Matplotlib site (at https://matplotlib.org/gallery.html ) features a series of examples of charts with code.

Matplotlib is installed with Anaconda, so we have to import it.
# we import the Matplotlib library
>>> import matplotlib as mlp
>>> import matplotlib.pyplot as plt
>>> %matplotlib inline
# this last line of code allows us to view the charts directly using Jupyter
We can represent elements by inserting them directly into the function, in the form of a list. The following code produces the plot in the Figure 10-1.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig1_HTML.jpg
Figure 10-1

A plotted list

>>> plt.plot([5,7,2,4])
>>> plt.plot([5,7,2,4], [4,6,9,2], 'ro')
# 'ro' stands for round object
The plot of round objects is shown in Figure 10-2.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig2_HTML.jpg
Figure 10-2

A plotted list of round objects

As shown in Figure 10-1, we can create a line. But, we can also customize color and type of representation by modifying arguments.

Let’s can create two objects and represent them.
# we create two objects
>>> x = [ 50, 70, 90, 65]
>>> y = [129, 192, 163, 172]
>>> plt.plot(x, y, linewidth = 4.0)
The plot is shown in Figure 10-3.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig3_HTML.jpg
Figure 10-3

A custom plot

As you can see, we changed the line thickness with the line width argument. We can also modify the line with the linestyle argument, or ls:
>>> plt.plot(x, y, linewidth = 2.0, linestyle = '--')
The plot is shown in Figure 10-4.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig4_HTML.jpg
Figure 10-4

Modified line style

We can add markers to highlight data better:
>>> plt.plot(x, y, linewidth = 1.0, ls = '-', marker = "o", markersize = 10)
The plot is shown in Figure 10-5.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig5_HTML.jpg
Figure 10-5

Plot with markers

We can customize the markers even more further by editing, for example, the inner color:
>>> plt.plot(x, y, linewidth = 1.0, ls = '-', marker = "o", markersize = 10, markerfacecolor = 'white')
The plot is shown in Figure 10-6.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig6_HTML.jpg
Figure 10-6

Plot with altered marker color

More information on how to customize a chart can be found using
>>> help(plt.plot)
Next we can add parameters to add a title and axes names.
>>> plt.plot(x, y)
>>> plt.title("TITLE")
>>> plt.xlabel("Axis X")
>>> plt.ylabel("Axis Y")
The plot is shown in Figure 10-7.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig7_HTML.jpg
Figure 10-7

Plot with a title and axes labels

We can customize a chart even further by changing colors for all chart elements:
>>> plt.plot(x, y, color = "yellow")
>>> plt.title("TITLE", color = "blue")
>>> plt.xlabel("Axis X", color = "purple")
>>> plt.ylabel("Axis Y", color = "green")
The plot is shown in Figure 10-8.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig8_HTML.jpg
Figure 10-8

Altering plot element colors

We can also add a grid by using the ‘grid’ parameter, and a legend by using the ‘legend’ parameter.
>>> plt.plot(x, y)
>>> plt.title("TITLE", color = "blue")
>>> plt.xlabel("Axis X", color = "purple")
>>> plt.ylabel("Axis Y", color = "green")
>>> plt.grid(True)
>>> plt.legend(['Legend1'])
The plot is shown in Figure 10-9.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig9_HTML.jpg
Figure 10-9

A plot with a grid and a legend

We can move the legend into the chart by changing the ‘loc’ parameter:
>>> plt.plot(x, y)
>>> plt.title("TITLE", color = "blue")
>>> plt.xlabel("Axis X", color = "purple")
>>> plt.ylabel("Axis Y", color = "green")
>>> plt.grid(True)
>>> plt.legend(['Legend2'], loc = 2)
The plot is shown in Figure 10-10.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig10_HTML.jpg
Figure 10-10

A plot with a repositioned legend

These are the possible positions of the legend:
  • 0

  • 1 = top right

  • 2 = top left

  • 3 = bottom right

  • 4 = lower left

  • 5 = to the right

  • 6 = centered left

  • 7 = centered right

  • 8 = centered low

  • 9 = centered high

  • 10 = centered

The codes for color are as follows:
  • b = blue

  • c = cyan

  • g = green

  • m = magenta

  • r = red

  • y = yellow

  • k = black

  • w = white

We can also change the shapes used in a plot:
>>> plt.plot([1,2,3,4],[1,4,8,15],'b*')
>>> plt.plot([1,3,5,7],[1,4,8,12],'g^')
>>> plt.plot([1,2,3,5],[2,5,4,12],'ro')
>>> plt.legend(['First','Second','Third'],loc=0)
The plot is shown in Figure 10-11.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig11_HTML.jpg
Figure 10-11

A plot with points of different shape

Now let’s create subcharts using the subplot() function :
>>> plt.subplot(2,2,1)
>>> plt.plot([1,2,3,4],[1,4,8,15],'b*')
>>> plt.subplot(2,2,2)
>>> plt.plot([1,3,5,7],[1,4,8,12],'g^')
>>> plt.subplot(2,2,3)
>>> plt.plot([1,2,3,5],[2,5,4,12],'ro')
>>> plt.subplot(2,2,4)
>>> plt.plot([1,2,3,5],[2,5,4,12],'b')
The plot is shown in Figure 10-12.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig12_HTML.jpg
Figure 10-12

Creation of subplots

We can indicate how many charts we want (in this case, two) and how they are placed (in this case, side by side):
>>> plt.subplot(1,2,1)
>>> plt.plot([1,2,3,4],[1,4,8,15],'b*')
>>> plt.subplot(1,2,2)
>>> plt.plot([1,3,5,7],[1,4,8,12],'g^')
The plot is shown in Figure 10-13.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig13_HTML.jpg
Figure 10-13

Creation of two subplots set side by side

Pie Charts

Now let’s see how to create pie charts: a pie chart can be used to show the composition of something (like a market). To plot a pie we can use the plt.pie() function:
>>> plt.pie(x)
The plot is shown in Figure 10-14.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig14_HTML.jpg
Figure 10-14

A basic pie chart

We can customize it by editing its colors:
# we create a palette of colors
>>> col1 = ["yellow", "red", "purple", "orange"]
# we apply the new colors to the chart
>>> plt.pie(x, colors = col1)
The plot is shown in Figure 10-15.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig15_HTML.jpg
Figure 10-15

A pie chart with custom colors

To modify colors even further, we can use hex codes. A list of the codes can be found at http://cloford.com/resources/colours/500col.htm .

Let’s add some labels:
>>> lab1 = ['A','B','C','D']
>>> plt.pie(x, colors = col1, labels = lab1)
The plot is shown in Figure 10-16.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig16_HTML.jpg
Figure 10-16

A pie chart with labels

We can separate sections of the pie by using the ‘explode’ parameter. We can even indicate the distance among the exploded pie sections:
>>> ex1 = [0.5,0,0,1]
>>> lab1 = ['A','B','C','D']
>>> plt.pie(x, colors = col1, labels = lab1, explode = ex1)
The plot is shown in Figure 10-17.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig17_HTML.jpg
Figure 10-17

An exploded pie chart

Other Plots and Charts

We can create yet other types of plots and charts. For example, we can build a scatterplot . A scatterplot is very useful to see the relationship between two variables.
>>> plt.scatter(x, y)
The plot is shown in Figure 10-18.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig18_HTML.jpg
Figure 10-18

A scatterplot

We can create bar charts with the plt.bar() function. Bar charts and histograms are very useful to compare our data and also to display categorical variables:
>>> plt.bar(x, y)
The plot is shown in Figure 10-19.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig19_HTML.jpg
Figure 10-19

A bar chart

We can change the orientation of a bar chart:
>>> plt.barh(x, y)
The plot is shown in Figure 10-20.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig20_HTML.jpg
Figure 10-20

A reoriented bar chart

We can create a chart from a data frame. To do this, we must import pandas for dataset and NumPy management. Let’s generate a random set of ten cases and four variables.
>>> import pandas as pd
>>> import numpy as np
>>> df1 = pd.DataFrame(np.random.rand(10, 4), columns = ['var1', 'var2', 'var3', 'var4'])
>>> df1.plot(kind = "bar")
The plot is shown in Figure 10-21.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig21_HTML.jpg
Figure 10-21

A bar chart created using a random data frame

To create stacked bars, we use the parameter ‘stacked’:
>>> df1.plot(kind = "bar", stacked = True)
The plot is shown in Figure 10-22.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig22_HTML.jpg
Figure 10-22

A chart with stacked bars

We can create a histogram that represents the variables of the dataset. (Histograms are discussed in more detail at the end of the chapter.)
>>> df1.hist()
The plot is shown in Figure 10-23.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig23_HTML.jpg
Figure 10-23

Multiple histograms for each variable in the dataset

Or, we can represent a single variable:
>>> df1['var1'].hist()
The plot is shown in Figure 10-24.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig24_HTML.jpg
Figure 10-24

A histogram of one variable

We can also select a column using methods other than the name, such as the .loc method.
>>> df1.loc[1].hist()
We create box plots by using the boxplot() function. This kind of visualization can be used to show the shape of the distribution, its central value, and its variability:
>>> df1.boxplot(return_type = "axes")
The plot is shown in Figure 10-25.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig25_HTML.jpg
Figure 10-25

A boxplot

We can build area charts:
>>> df1.plot(kind = "area")
The plot is shown in Figure 10-26.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig26_HTML.jpg
Figure 10-26

An area chart

Each function that we use has its own parameters, which we can change, as we saw in the first section of this chapter. For instance, we can change the colors of the area chart by applying the palette we already created:
>>> df1.plot(kind = "area", color = col1)
The plot is shown in Figure 10-27.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig27_HTML.jpg
Figure 10-27

An area chart with an altered color palette

Saving Plots and Charts

We can save our plots and charts with the .savefig method. We can also designate its name and set the resolution (dots per inch) as well:
>>> df1.plot(kind = "scatter", x = "var3", y = "var4")
# we save the image in the working directory in the following way
>>> plt.savefig('graph1.png’, dpi = 600)

Let’s check whether the chart has been saved successfully to our working directory. We can use the image downloaded for example for a presentation, or including it in a report after the data analysis or to explain our data in an exploratory phase.

Selecting Plot and Chart Styles

Matplotlib also includes a set of styles that can be applied to charts. We can view these styles by typing:
>>> plt.style.available
['bmh',
 'classic',
 'dark_background',
 'fivethirtyeight',
 'ggplot',
 'grayscale',
 'seaborn-bright',
 'seaborn-colorblind',
 'seaborn-dark-palette',
 'seaborn-dark',
 'seaborn-darkgrid',
 'seaborn-deep',
 'seaborn-muted',
 'seaborn-notebook',
 'seaborn-paper',
 'seaborn-pastel',
 'seaborn-poster',
 'seaborn-talk',
 'seaborn-ticks',
 'seaborn-white',
 'seaborn-whitegrid',
 'seaborn']
To apply a style, we must insert a line of code that features the theme name:
>>> plt.style.use('dark_background')
>>> df1.plot(kind = "area")
The plot is shown in Figure 10-28.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig28_HTML.jpg
Figure 10-28

A custom area chart that uses a Matplotlib “dark background” theme

Here is another example:
>>> plt.style.use('seaborn-darkgrid')
>>> df1.plot(kind = "area")
The plot is shown in Figure 10-29.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig29_HTML.jpg
Figure 10-29

A custom area chart with a “seaborn” theme

More on Histograms

We can create two random objects with NumPy and represent them graphically separately, then compile their data into one chart :
>>> df2 = np.random.randn(100)
>>> df3 = np.random.randn(100)
>>> plt.hist(df2)
The first plot is shown in Figure 10-30.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig30_HTML.jpg
Figure 10-30

The first histogram

Now let’s display the second histogram.
>>> plt.hist(df3)
The plot is shown in Figure 10-31.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig31_HTML.jpg
Figure 10-31

The second histogram

Now let’s combine the two datasets:
>>> plt.hist(df2, color = "red", alpha = 0.3, bins = 15)
>>> plt.hist(df3, alpha = 0.6, bins = 15)
# we present the two datasets together and define whether we want color, transparency through the alpha parameter, and the number of intervals into which we want data to be divided.
The plot is shown in Figure 10-32.
../images/469457_1_En_10_Chapter/469457_1_En_10_Fig32_HTML.jpg
Figure 10-32

A combined histogram

Matplotlib is just one of many Python packages that can be used to display data. Other chart creation packages can be found at http://pbpython.com/visualization-tools-1.html . One of the most used data mining charts, for example, is seaborn.

Summary

Matplotlib is one of the most basic libraries for plotting data. Plotting datasets for data analysis is crucial to understanding the relationships among variables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.178