Chapter 11
IN THIS CHAPTER
Selecting the right graph for the job
Working with advanced scatterplots
Exploring time-related and geographical data
Creating graphs
Chapter 10 helped you understand the mechanics of working with MatPlotLib, which is an important first step toward using it. This chapter takes the next step in helping you use MatPlotLib to perform useful work. The main goal of this chapter is to help you visualize your data in various ways. Creating a graphic presentation of your data is essential if you want to help other people understand what you’re trying to say. Even though you can see what the numbers mean in your mind, other people will likely need graphics to see what point you’re trying to make by manipulating data in various ways.
The chapter starts by looking at some basic graph types that MatPlotLib supports. You don’t find the full list of graphs and plots listed in this chapter — it could take an entire book to explore them all in detail. However, you do find the most common types.
In the remainder of the chapter, you begin exploring specific sorts of plotting as it relates to data science. Of course, no book on data science would be complete without exploring scatterplots, which are used to help people see patterns in seemingly unrelated data points. Because much of the data that you work with today is time related or geographic in nature, the chapter devotes two special sections to these topics. You also get to work with both directed and undirected graphs, which is fine for social media analysis.
The kind of graph you choose determines how people view the associated data, so choosing the right graph from the outset is important. For example, if you want to show how various data elements contribute toward a whole, you really need to use a pie chart. On the other hand, when you want people to form opinions on how data elements compare, you use a bar chart. The idea is to choose a graph that naturally leads people to draw the conclusion that you need them to draw about the data that you’ve carefully massaged from various data sources. (You also have the option of using line graphs — a technique demonstrated in Chapter 10.) The following sections describe the various graph types and provide you with basic examples of how to use them.
Pie charts focus on showing parts of a whole. The entire pie would be 100 percent. The question is how much of that percentage each value occupies. The following example shows how to create a pie chart with many of the special features in place:
import matplotlib.pyplot as plt
%matplotlib inline
values = [5, 8, 9, 10, 4, 7]
colors = ['b', 'g', 'r', 'c', 'm', 'y']
labels = ['A', 'B', 'C', 'D', 'E', 'F']
explode = (0, 0.2, 0, 0, 0, 0)
plt.pie(values, colors=colors, labels=labels,
explode=explode, autopct='%1.1f%%',
counterclock=False, shadow=True)
plt.title('Values')
plt.show()
The essential part of a pie chart is the values. You could create a basic pie chart using just the values as input.
The colors
parameter lets you choose custom colors for each pie wedge. You use the labels
parameter to identify each wedge. In many cases, you need to make one wedge stand out from the others, so you add the explode
parameter with list of explode values. A value of 0 keeps the wedge in place — any other value moves the wedge out from the center of the pie.
Each pie wedge can show various kinds of information. This example shows the percentage occupied by each wedge with the autopct
parameter. You must provide a format string to format the percentages.
In most cases, you also want to give your pie chart a title so that others know what it represents. You do this using the title()
function. Figure 11-1 shows the output from this example.
Bar charts make comparing values easy. The wide bars and segregated measurements emphasize the differences between values, rather than the flow of one value to another as a line graph would do. Fortunately, you have all sorts of methods at your disposal for emphasizing specific values and performing other tricks. The following example shows just some of the things you can do with a vertical bar chart.
import matplotlib.pyplot as plt
%matplotlib inline
values = [5, 8, 9, 10, 4, 7]
widths = [0.7, 0.8, 0.7, 0.7, 0.7, 0.7]
colors = ['b', 'r', 'b', 'b', 'b', 'b']
plt.bar(range(0, 6), values, width=widths,
color=colors, align='center')
plt.show()
To create even a basic bar chart, you must provide a series of x coordinates and the heights of the bars. The example uses the range()
function to create the x coordinates, and values
contains the heights.
Of course, you may want more than a basic bar chart, and MatPlotLib provides a number of ways to get the job done. In this case, the example uses the width
parameter to control the width of each bar, emphasizing the second bar by making it slightly larger. The larger width would show up even in a black-and-white printout. It also uses the color
parameter to change the color of the target bar to red (the rest are blue).
As with other chart types, the bar chart provides some special features that you can use to make your presentation stand out. The example uses the align
parameter to center the data on the x coordinate (the standard position is to the left). You can also use other parameters, such as hatch
, to enhance the visual appearance of your bar chart. Figure 11-2 shows the output of this example.
Histograms categorize data by breaking it into bins, where each bin contains a subset of the data range. A histogram then displays the number of items in each bin so that you can see the distribution of data and the progression of data from bin to bin. In most cases, you see a curve of some type, such as a bell curve. The following example shows how to create a histogram with randomized data:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = 20 * np.random.randn(10000)
plt.hist(x, 25, range=(-50, 50), histtype='stepfilled',
align='mid', color='g', label='Test Data')
plt.legend()
plt.title('Step Filled Histogram')
plt.show()
In this case, the input values are a series of random numbers. The distribution of these numbers should show a type of bell curve. As a minimum, you must provide a series of values, x
in this case, to plot. The second argument contains the number of bins to use when creating the data intervals. The default value is 10. Using the range
parameter helps you focus the histogram on the relevant data and exclude any outliers.
You can create multiple histogram types. The default setting creates a bar chart. You can also create a stacked bar chart, stepped graph, or filled stepped graph (the type shown in the example). In addition, it’s possible to control the orientation of the output, with vertical as the default.
As with most other charts and graphs in this chapter, you can add special features to the output. For example, the align
parameter determines the alignment of each bar along the baseline. Use the color
parameter to control the colors of the bars. The label
parameter doesn’t actually appear unless you also create a legend (as shown in this example). Figure 11-3 shows typical output from this example.
Boxplots provide a means of depicting groups of numbers through their quartiles (three points dividing a group into four equal parts). A boxplot may also have lines, called whiskers, indicating data outside the upper and lower quartiles. The spacing shown within a boxplot helps indicate the skew and dispersion of the data. The following example shows how to create a boxplot with randomized data.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
spread = 100 * np.random.rand(100)
center = np.ones(50) * 50
flier_high = 100 * np.random.rand(10) + 100
flier_low = -100 * np.random.rand(10)
data = np.concatenate((spread, center,
flier_high, flier_low))
plt.boxplot(data, sym='gx', widths=.75, notch=True)
plt.show()
To create a usable dataset, you need to combine several different number-generation techniques, as shown at the beginning of the example. Here are how these techniques work:
spread
: Contains a set of random numbers between 0 and 100center
: Provides 50 values directly in the center of the range of 50flier_high
: Simulates outliers between 100 and 200flier_low
: Simulates outliers between 0 and –100The code combines all these values into a single dataset using concatenate()
. Being randomly generated with specific characteristics (such as a large number of points in the middle), the output will show specific characteristics but will work fine for the example.
The call to boxplot()
requires only data
as input. All other parameters have default settings. In this case, the code sets the presentation of outliers to green Xs by setting the sym
parameter. You use widths
to modify the size of the box (made extra large in this case to make the box easier to see). Finally, you can create a square box or a box with a notch using the notch
parameter (which normally defaults to False). Figure 11-4 shows typical output from this example.
The box shows the three data points as the box, with the red line in the middle being the median. The two black horizontal lines connected to the box by whiskers show the upper and lower limits (for four quartiles). The outliers appear above and below the upper and lower limit lines as green Xs.
Scatterplots show clusters of data rather than trends (as with line graphs) or discrete values (as with bar charts). The purpose of a scatterplot is to help you see data patterns. The following example shows how to create a scatterplot using randomized data:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x1 = 5 * np.random.rand(40)
x2 = 5 * np.random.rand(40) + 25
x3 = 25 * np.random.rand(20)
x = np.concatenate((x1, x2, x3))
y1 = 5 * np.random.rand(40)
y2 = 5 * np.random.rand(40) + 25
y3 = 25 * np.random.rand(20)
y = np.concatenate((y1, y2, y3))
plt.scatter(x, y, s=[100], marker='^', c='m')
plt.show()
The example begins by generating random x and y coordinates. For each x coordinate, you must have a corresponding y coordinate. It’s possible to create a scatterplot using just the x and y coordinates.
It’s possible to dress up a scatterplot in a number of ways. In this case, the s
parameter determines the size of each data point. The marker
parameter determines the data point shape. You use the c
parameter to define the colors for all the data points, or you can define a separate color for individual data points. Figure 11-5 shows the output from this example.
Scatterplots are especially important for data science because they can show data patterns that aren’t obvious when viewed in other ways. You can see data groupings with relative ease and help the viewer understand when data belongs to a particular group. You can also show overlaps between groups and even demonstrate when certain data is outside the expected range. Showing these various kinds of relationships in the data is an advanced technique that you need to know in order to make the best use of MatPlotLib. The following sections demonstrate how to perform these advanced techniques on the scatterplot you created earlier in the chapter.
Color is the third axis when working with a scatterplot. Using color lets you highlight groups so that others can see them with greater ease. The following example shows how you can use color to show groups within a scatterplot:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x1 = 5 * np.random.rand(50)
x2 = 5 * np.random.rand(50) + 25
x3 = 30 * np.random.rand(25)
x = np.concatenate((x1, x2, x3))
y1 = 5 * np.random.rand(50)
y2 = 5 * np.random.rand(50) + 25
y3 = 30 * np.random.rand(25)
y = np.concatenate((y1, y2, y3))
color_array = ['b'] * 50 + ['g'] * 50 + ['r'] * 25
plt.scatter(x, y, s=[50], marker='D', c=color_array)
plt.show()
The example works essentially the same as the scatterplot example in the previous section, except that this example uses an array for the colors. Unfortunately, if you’re seeing this in the printed book, the differences between the shades of gray in Figure 11-6 will be hard to see. However, the first group is blue, followed by green for the second group. Any outliers appear in red.
In some cases, you need to know the general direction that your data is taking when looking at a scatterplot. Even if you create a clear depiction of the groups, the actual direction that the data is taking as a whole may not be clear. In this case, you add a trendline to the output. Here’s an example of adding a trendline to a scatterplot that includes groups but isn’t quite as clear as the scatterplot shown previously in Figure 11-6.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.pylab as plb
%matplotlib inline
x1 = 15 * np.random.rand(50)
x2 = 15 * np.random.rand(50) + 15
x3 = 30 * np.random.rand(25)
x = np.concatenate((x1, x2, x3))
y1 = 15 * np.random.rand(50)
y2 = 15 * np.random.rand(50) + 15
y3 = 30 * np.random.rand(25)
y = np.concatenate((y1, y2, y3))
color_array = ['b'] * 50 + ['g'] * 50 + ['r'] * 25
plt.scatter(x, y, s=[90], marker='*', c=color_array)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
plb.plot(x, p(x), ’m-’)
plt.show()
The code for creating the scatterplot is essentially the same as in the example in the “Depicting groups” section, earlier in the chapter, but the plot doesn’t define the groups as clearly. Adding a trendline means calling the NumPy polyfit()
function with the data, which returns a vector of coefficients, p
, that minimizes the least-squares error. (Least-square regression is a method for finding a line that summarizes the relationship between two variables, x
and y
in this case, at least within the domain of the explanatory variable x
. The third polyfit()
parameter expresses the degree of the polynomial fit.)
The vector output of polyfit()
is used as input to poly1d()
, which calculates the actual y axis data points. The call to plot()
creates the trendline on the scatterplot. You can see a typical result of this example in Figure 11-7.
Nothing is truly static. When you view most data, you see an instant of time — a snapshot of how the data appeared at one particular moment. Of course, such views are both common and useful. However, sometimes you need to view data as it moves through time — to see it as it changes. Only by viewing the data as it changes can you expect to understand the underlying forces that shape it. The following sections describe how to work with data on a time-related basis.
Many times, you need to present data over time. The data could come in many forms, but generally you have some type of time tick (one unit of time), followed by one or more features that describe what happens during that particular tick. The following example shows a simple set of days and sales on those days for a particular item in whole (integer) amounts.
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
%matplotlib inline
start_date = dt.datetime(2018, 7, 30)
end_date = dt.datetime(2018, 8, 5)
daterange = pd.date_range(start_date, end_date)
sales = (np.random.rand(len(daterange)) * 50).astype(int)
df = pd.DataFrame(sales, index=daterange,
columns=['Sales'])
df.loc['Jul 30 2018':'Aug 05 2018'].plot()
plt.ylim(0, 50)
plt.xlabel('Sales Date')
plt.ylabel('Sale Value')
plt.title('Plotting Time')
plt.show()
The example begins by creating a DataFrame
to hold the information. The source of the information could be anything, but the example generates it randomly. Notice that the example creates a date_range
to hold the starting and ending date time frame for easier processing using a for
loop.
An essential part of this example is the creation of individual rows. Each row has an actual time value so that you don’t lose information. However, notice that the index (row_s.name
property) is a string. This string should appear in the form that you want the dates to appear when presented in the plot.
Using loc[]
lets you select a range of dates from the total number of entries available. Notice that this example uses only some of the generated data for output. It then adds some amplifying information about the plot and displays it onscreen. The call to plot()
must specify the x
and y
values in this case or you get an error. Figure 11-8 show typical output from the randomly generated data.
As with any other data presentation, sometimes you really can’t see what direction the data is headed in without help. The following example starts with the plot from the previous section and adds a trendline to it:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
%matplotlib inline
start_date = dt.datetime(2018, 7, 29)
end_date = dt.datetime(2018, 8, 7)
daterange = pd.date_range(start_date, end_date)
sales = (np.random.rand(len(daterange)) * 50).astype(int)
df = pd.DataFrame(sales, index=daterange,
columns=['Sales'])
lr_coef = np.polyfit(range(0, len(df)), df['Sales'], 1)
lr_func = np.poly1d(lr_coef)
trend = lr_func(range(0, len(df)))
df['trend'] = trend
df.loc['Jul 30 2018':'Aug 05 2018'].plot()
plt.xlabel('Sales Date')
plt.ylabel('Sale Value')
plt.title('Plotting Time')
plt.legend(['Sales', 'Trend'])
plt.show()
Sales trend
2018-07-29 6 18.890909
2018-07-30 13 20.715152
2018-07-31 38 22.539394
2018-08-01 22 24.363636
2018-08-02 40 26.187879
2018-08-03 39 28.012121
2018-08-04 36 29.836364
2018-08-05 21 31.660606
2018-08-06 7 33.484848
2018-08-07 49 35.309091
Using this approach makes it ultimately easier to plot the data. You call plot()
only once and avoid relying on the MatPlotLib, pylab
, as shown in the example in the “Showing correlations” section. The resulting code is simpler and less likely to cause the issues you see online.
When you plot the initial data, the call to plot()
automatically generates a legend for you. MatPlotLib doesn’t automatically add the trendline, so you must also create a new legend for the plot. Figure 11-9 shows typical output from this example using randomly generated data.
Knowing where data comes from or how it applies to a specific place can be important. For example, if you want to know where food shortages have occurred and plan how to deal with them, you need to match the data you have to geographical locations. The same holds true for predicting where future sales will occur. You may find that you need to use existing data to determine where to put new stores. Otherwise, you could put a store in a location that won’t receive much in the way of sales, and the effort will lose money rather than make it. The following sections describe how to work with Basemap to interact with geographical data.
Some of the packages you install have a tendency to also change your Notebook environment by installing other packages that may not work well with your baseline setup. Consequently, you see problems with code that functioned earlier. Normally, these problems consist mostly of warning messages, such as deprecation warnings as discussed in the “Dealing with deprecated library issues” section, later in this chapter. In some cases, however, the changed packages can also tweak the output you obtain from code. Perhaps a newer package uses an updated algorithm or interacts with the code differently. When you have a package, such as Basemap, that makes changes to the overall baseline configuration and you want to maintain your current configuration, you need to set up an environment for it. An environment keeps your baseline configuration intact but also allows the new package to create the environment it needs to execute properly. The following steps help you create the Basemap environment used for this chapter:
Open an Anaconda Prompt.
Notice that the prompt shows the location of your folder on your system, but that it’s preceded by (base)
. The (base)
indicator tells you that you’re in your baseline environment — the one you want to preserve.
Type conda create -n Basemap python=3 anaconda=5.2.0 and press Enter.
This action creates a new Basemap environment. This new environment will use Python 3.6 and Anaconda 5.2.0. You get precisely the same baseline as you’ve been using so far.
Type source activate Basemap if you’re using OS X or Linux or activate Basemap if you’re using Windows and press Enter.
You have now changed over to the Basemap environment. Notice that the prompt no longer says (base)
, it says (Basemap)
instead.
Type Jupyter Notebook and press Enter.
You see Notebook start, but it uses the Basemap environment, rather than the baseline environment. This copy of Notebook works precisely the same as any other copy of Notebook that you’ve used. The only difference is the environment in which it operates.
After you have finished using the Basemap environment, type deactivate at the prompt and press Enter. You see the prompt change back to (base)
.
Before you can work with mapping data, you need a library that supports the required mapping functionality. A number of such packages are available, but the easiest to work with and install is the Basemap Toolkit. You can obtain this toolkit from https://matplotlib.org/basemap/users/intro.html
. (Make sure you close Notebook and stop the server before you proceed in this section to avoid file access errors.) However, the easiest method is to use the conda tool from the Anaconda Prompt to enter the following commands:
conda install -c conda-forge basemap=1.1.0
conda install -c conda-forge basemap-data-hires
conda install -c conda-forge proj4=5.2.0
The site does include supplementary information about the toolkit, so you may want to visit it anyway. Unlike some other packages, this one does include instructions for Mac, Windows, and Linux users. In addition, you can obtain a Windows-specific installer. Make sure to also check out the usage video at http://nbviewer.ipython.org/github/mqlaql/geospatial-data/blob/master/Geospatial-Data-with-Python.ipynb
.
You need the following code to use the toolkit once you have it installed:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
%matplotlib inline
One of the major advantages of working with Python is the huge number of packages that it supports. Unfortunately, not every package receives updates quickly enough to avoid using deprecated features in other packages. A deprecated feature is one that still exists in the target package, but the developers of that package plan to remove it in an upcoming update. Consequently, you receive a deprecated package warning when you run your code. Even though the deprecation warning doesn’t keep your code from running, it does tend to make people leery of your application. After all, no one wants to see what appears to be an error message as part of the output. The fact that Notebook displays these messages in light red by default doesn’t help matters.
C:UsersLucaAnaconda3libsite-packagesmpl_toolkits
asemap\__init__.py:1708: MatplotlibDeprecationWarning:
The axesPatch function was deprecated in version 2.1.
Use Axes.patch instead.
limb = ax.axesPatch
C:UsersLucaAnaconda3libsite-packagesmpl_toolkits
asemap\__init__.py:1711: MatplotlibDeprecationWarning:
The axesPatch function was deprecated in version 2.1. Use
Axes.patch instead.
if limb is not ax.axesPatch:
That looks like a lot of really terrifying text, but these messages point out two issues. The first is that the problem is in MatPlotLib and it revolves about the axesPatch call. The messages also tell you that this particular call is deprecated starting with version 2.1. Use this code to check your version of MatPlotLib:
import matplotlib
print(matplotlib.__version__)
If you installed Anaconda using the instructions in Chapter 3, you see that you have MatPlotLib 2.2.2 as a minimum. Consequently, one way to deal with this problem is to downgrade your copy of MatPlotLib by using the following command at the Anaconda Prompt:
conda install -c conda-forge matplotlib=2.0.2
The problem with this approach is that it can also cause problems for any code that uses the newer features found in MatPlotLib 2.2.2. It’s not optimal, but if you use Basemap in your application a lot, it might be a practical solution.
import warnings
warnings.filterwarnings("ignore")
Now that you have a good installation of Basemap, you can do something with it. The following example shows how to draw a map and place pointers to specific locations on it:
austin = (-97.75, 30.25)
hawaii = (-157.8, 21.3)
washington = (-77.01, 38.90)
chicago = (-87.68, 41.83)
losangeles = (-118.25, 34.05)
m = Basemap(projection='merc',llcrnrlat=10,urcrnrlat=50,
llcrnrlon=-160,urcrnrlon=-60)
m.drawcoastlines()
m.fillcontinents(color='lightgray',lake_color='lightblue')
m.drawparallels(np.arange(-90.,91.,30.))
m.drawmeridians(np.arange(-180.,181.,60.))
m.drawmapboundary(fill_color='aqua')
m.drawcountries()
x, y = m(*zip(*[hawaii, austin, washington,
chicago, losangeles]))
m.plot(x, y, marker='o', markersize=6,
markerfacecolor='red', linewidth=0)
plt.title("Mercator Projection")
plt.show()
The example begins by defining the longitude and latitude for various cities. It then creates the basic map. The projection
parameter defines the basic map appearance. The next four parameters, llcrnrlat
, urcrnrlat
, llcrnrlon
, and urcrnrlon
define the sides of the map. You can define other parameters, but these parameters generally create a useful map.
The next set of calls defines the map particulars. For example, drawcoastlines()
determines whether the coastlines are highlighted to make them easy to see. To make landmasses easy to discern from water, you want to call fillcontinents()
with the colors of your choice. When working with specific locations, as the example does, you want to call drawcountries()
to ensure that the country boundaries appear on the map. At this point, you have a map that’s ready to fill in with data.
In this case, the example creates x and y coordinates using the previously stored longitude and latitude values. It then plots these locations on the map in a contrasting color so that you can easily see them. The final step is to display the map, as shown in Figure 11-10.
A graph is a depiction of data showing the connections between data points using lines. The purpose is to show that some data points relate to other data points, but not all the data points that appear on the graph. Think about a map of a subway system. Each of the stations connects to other stations, but no single station connects to all the stations in the subway system. Graphs are a popular data science topic because of their use in social media analysis. When performing social media analysis, you depict and analyze networks of relationships, such as friends or business connections, from social hubs such as Facebook, Google+, Twitter, or LinkedIn.
As previously stated, an undirected graph simply shows connections between nodes. The output doesn’t provide a direction from one node to the next. For example, when establishing connectivity between web pages, no direction is implied. The following example shows how to create an undirected graph:
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
G = nx.Graph()
H = nx.Graph()
G.add_node(1)
G.add_nodes_from([2, 3])
G.add_nodes_from(range(4, 7))
H.add_node(7)
G.add_nodes_from(H)
G.add_edge(1, 2)
G.add_edge(1, 1)
G.add_edges_from([(2,3), (3,6), (4,6), (5,6)])
H.add_edges_from([(4,7), (5,7), (6,7)])
G.add_edges_from(H.edges())
nx.draw_networkx(G)
plt.show()
In contrast to the canned example found in the “Using NetworkX basics” section of Chapter 8, this example builds the graph using a number of different techniques. It begins by importing the Networkx package you use in Chapter 8. To create a new undirected graph, the code calls the Graph()
constructor, which can take a number of input arguments to use as attributes. However, you can build a perfectly usable graph without using attributes, which is what this example does.
The easiest way to add a node is to call add_node()
with a node number. You can also add a list, dictionary, or range()
of nodes using add_nodes_from()
. In fact, you can import nodes from other graphs if you want.
Nodes don’t have any connectivity at the outset. You must define connections (edges) between them. To add a single edge, you call add_edge()
with the numbers of the nodes that you want to add. As with nodes, you can use add_edges_from()
to create more than one edge using a list, dictionary, or another graph as input. Figure 11-11 shows the output from this example (your output may differ slightly but should have the same connections).
You use directed graphs when you need to show a direction, say from a start point to an end point. When you get a map that shows you how to get from one specific point to another, the starting node and ending node are marked as such and the lines between these nodes (and all the intermediate nodes), show direction.
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
G = nx.DiGraph()
G.add_node(1)
G.add_by nodes_from([2, 3])
G.add_nodes_from(range(4, 6))
G.add_path([6, 7, 8])
G.add_edge(1, 2)
G.add_edges_from([(1,4), (4,5), (2,3), (3,6), (5,6)])
colors = ['r', 'g', 'g', 'g', 'g', 'm', 'm', 'r']
labels = {1:'Start', 2:'2', 3:'3', 4:'4',
5:'5', 6:'6', 7:'7', 8:'End'}
sizes = [800, 300, 300, 300, 300, 600, 300, 800]
nx.draw_networkx(G, node_color=colors, node_shape='D',
with_labels=True, labels=labels,
node_size=sizes)
plt.show()
The example begins by creating a directional graph using the DiGraph()
constructor. You should note that the NetworkX package also supports MultiGraph()
and MultiDiGraph()
graph types. You can see a listing of all the graph types at https://networkx.lanl.gov/reference/classes.html
.
Adding nodes is much like working with an undirected graph. You can add single nodes using add_node()
and multiple nodes using add_nodes_from()
. The add_path()
call lets you create nodes and edges at the same time. The order of nodes in the call is important. The flow from one node to another is from left to right in the list supplied to the call.
This example adds special node colors, labels, shape (only one shape is used), and sizes to the output. You still call on draw_networkx()
to perform the task. However, adding the parameters shown changes the appearance of the graph. Note that you must set with_labels
to True
in order to see the labels provided by the labels
parameter. Figure 11-12 shows the output from this example.
3.138.85.238