Creating graphs

You will learn how to do graphs with two Python libraries: matplotlib and seaborn. Matplotlib is a mature, well-tested, and cross-platform graphics engine. In order to work with it, you need to import it. However, you need also to import an interface to it. Matplotlib is the whole library, and matplotlib.pyplot is a module in matplotlib. Pyplot is the interface to the underlying plotting library that knows how to automatically create the figure and axes and other necessary elements to create the desired plot. Seaborn is a visualization library built on matplotlib, adding additional enhanced graphing options, and makes working with pandas data frames easy.

Anyway, without further talking, let's start developing. First, let's import all the necessary packages for this section, using the following code:

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

The next step is to create sample data. An array of 100 evenly distributed numbers between 0 and 10 is the data for the independent variable, and then the following code creates two dependent variables, one as the sinus of the independent one, and the second as the natural logarithm of the independent one:

x = np.linspace(0.01, 10, 100)
y = np.sin(x) z = np.log(x)

The following code defines the style to use for the graph and then plots two lines, one for each function. The plt.show() command is needed to show the graph interactively:

plt.style.use('classic')
plt.plot(x, y)
plt.plot(x, z)
plt.show()
  

If you execute the previous code in Visual Studio 2017, you should get a pop-up window with the desired graph. I am not showing the graph yet; before showing it, I want to make some enhancements and besides showing it also save it to a file. The following code uses the plt.figure() function to create an object that will store the graph. Then for each function it defines the line style, line width, line color, and label. The plt.axis() line redefines the axes range. The next three lines define the axes titles and the title of the graph, and define the font size for the text. The plt.legend() line draws the legend. The last two lines show the graph interactively and save it to a file:

    f = plt.figure()
    plt.plot(x, y, color = 'blue', linestyle = 'solid',
             linewidth = 4, label = 'sin')
    plt.plot(x, z, color = 'red', linestyle = 'dashdot',
             linewidth = 4, label = 'log')
    plt.axis([-1, 11, -2, 3.5])
    plt.xlabel("X", fontsize = 16)
    plt.ylabel("sin(x) & log(x)", fontsize = 16)
    plt.title("Enhanced Line Plot", fontsize = 25)
    plt.legend(fontsize = 16)
    plt.show()
    f.savefig('C:\SQL2017DevGuide\B08539_15_04.png')
  

Here is the result of the previous codeā€”the first nice graph:

Line chart

If you are interested in which graphical formats are supported, use the following code:

f.canvas.get_supported_filetypes() 

You will find out that all of the most popular formats are supported.

Now it's time to switch to some more realistic examples. First, let's import the target mail data from a CSV file in a pandas DataFrame and get some basic info about it:

TM = pd.read_csv("C:SQL2017DevGuideChapter15_TM.csv") 
# N of rows and cols 
print (TM.shape) 
# First 10 rows 
print (TM.head(10)) 
# Some statistics 
TM.mean() 
TM.max() 

The next graph you can create is a scatterplot. The following code plots YearlyIncome over Age. Note that the code creates a smaller data frame with the first hundred rows only, in order to get a less cluttered graph for the demo. Again, for the sake of brevity, I am not showing this graph:

TM1 = TM.head(100) 
plt.scatter(TM1['Age'], TM1['YearlyIncome']) 
plt.xlabel("Age", fontsize = 16) 
plt.ylabel("YearlyIncome", fontsize = 16) 
plt.title("YearlyIncome over Age", fontsize = 25) 
plt.show() 

For categorical variables, you usually create bar charts for a quick overview of the distribution. You can do it with the countplot() function from the seaborn package. Let's try to plot counts for the BikeBuyer variable in the classes of the Education variable, with the help of the following code:

sns.countplot(x="Education", hue="BikeBuyer", data=TM); 
plt.show() 

If you executed the previous code, you will have noticed that the Education variable is not sorted correctly. Similarly to R, you also need to inform Python about the intrinsic order of a categorical or nominal variable. The following code defines that the Education variable is categorical and then shows the categories:

TM['Education'] = TM['Education'].astype('category') 
TM['Education'] 

In the next step, the code defines the correct order, as shown here:

TM['Education'].cat.reorder_categories( 
    ["Partial High School",  
     "High School","Partial College",  
     "Bachelors", "Graduate Degree"], inplace=True) 
TM['Education'] 
Now it is time to create the bar chart again. This time, I am also saving it to a file, and showing it in the book. 
f = plt.figure() 
sns.countplot(x="Education", hue="BikeBuyer", data=TM); 
plt.show() 
f.savefig('C:\SQL2017DevGuide\B08539_15_05.png') 

So, here is the resulting bar chart:

Bar chart

You can also do a chart with small sub-charts, the trellis chart, with the FacetGrid() function. Note that the following code uses the set() function to set the font_scale for all text in the graph at once:

sns.set(font_scale = 3) 
grid = sns.FacetGrid(TM, row = 'HouseOwnerFlag', col = 'BikeBuyer',  
                     margin_titles = True, size = 10) 
grid.map(plt.hist, 'YearlyIncome',  
         bins = np.linspace(0, np.max(TM['YearlyIncome']), 7)) 
plt.show() 

The following figure shows the result:

Trellis chart

Finally, let me also show you a nice violinplot, similar to the one created with the ggplot library in R in Chapter 14, Data Exploration and Predictive Modeling with R in SQL Server R. The code analyzes the distribution of the income in classes of education:

sns.violinplot(x = 'Education', y = 'YearlyIncome',   
               data = TM, kind = 'box', size = 8) 
plt.show() 

Here is the resultant graph:

Violin plot
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.149.19