Chapter 13. Visualizing data

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 13. Visualizing data

This chapter covers

Making quick plots of vector data using matplotlib
Plotting raster data with matplotlib
Creating maps with Mapnik

As you no doubt have noticed, the ability to view your data is essential. While you can use desktop GIS software, such as QGIS, sometimes it’s nice to see your data as you work, without needing to open it up in other software. This is the idea behind the VectorPlotter class in the ospybook module. Other times you might need to create a picture of your data, such as a quick-and-dirty plot to show a colleague, or perhaps a much nicer map to post online or give to a client. This isn’t a book on cartography (which is good, because I’m cartographically challenged), so this chapter will show you the basics of displaying data in a few different ways, but won’t focus on techniques for making the data look pretty. You’ll see how to use both the matplotlib and Mapnik modules to plot your data. If you want something pretty, you’ll want to go with Mapnik, but matplotlib is great for quick visualizations.

13.1. Matplotlib

Matplotlib is a general-purpose plotting library for Python and can be used for any kind of graphic you can think up. This module is extensive, and like NumPy and SciPy, entire books have been written on it. If you’re interested in seeing an overview of what can be done, check out the examples in the matplotlib gallery at http://matplotlib.org/gallery.html. The gallery contains many impressive examples for making charts and graphs, but we’re more interested in spatial data, so this section will concentrate on quick-and-crude plots of geographical datasets. In fact, the VectorPlotter class uses matplotlib, and you’ll learn the basics of how that class plots vector data.

Matplotlib has several parts, but the one that you interact with the most to plot data is pyplot, and that’s what we’ll use here. It’s convention to rename this as plt when importing it:

import matplotlib.pyplot as plt

You can use pyplot in interactive or non-interactive mode. Back in chapters 3 through 7, you used a VectorPlotter from an interactive console and saw the changes to your plots immediately. This was matplotlib at work in interactive mode. This mode is extremely handy for playing with matplotlib and learning how it works. It’s also useful for interactively exploring data.

Plotting isn’t interactive by default, however. This makes sense, because interactivity wouldn’t be helpful for a script that creates a graphic and saves it to disk with no input from the user. Exceptions exist to every rule, though, and you may find that if you’re using IPython in pylab mode or an IDE such as Spyder, then interactive mode will be on by default. When in interactive mode, the plot is automatically shown to the user, but if you want to show the plot when using non-interactive mode, then you must call the plt.show() method after adding all of the graphics to your plot. This will stop the script’s execution until the user closes the plot window. You might be tempted to use interactive mode so that the user can see the plot as it’s created, but you’ll probably have bad luck with that because the plot window disappears when the script ends. The user might see parts of the plot as it’s created, but if the script ends as soon as the plot is finished, then the user may never get a chance to see the final product.

If you want to turn interactive mode on, either from a script or the console, use this:

plt.ion()

You can turn interactive mode back off with plt.ioff().

13.1.1. Plotting vector data

You might be surprised to learn that plotting vector data isn’t that difficult. The data’s made up of x and y coordinates, after all. First you’ll see how to use the plot function to draw points, lines, and polygons in general, and then you’ll graduate to plotting shapefiles. Once you can do that, you’ll learn how to create holes in the special case of donut polygons, so that other data can show through if needed. The plot function has many options, most of which will be ignored here, but you can read about them all in the online documentation found at http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot. This function wants, at the minimum, lists of x and y coordinates. If that’s all you provide, then a line is plotted using those coordinates and a color from the matplotlib color cycle. For example, the following code plots the line y = x², shown in figure 13.1:

import matplotlib.pyplot as plt

x = range(10)
y = [i * i for i in x]
plt.plot(x, y)
plt.show()

Figure 13.1. A simple line plot

You can specify a color and change this from a line to a series of points simply by providing a marker specification, as in the following example. In this case, 'ro' means that it should draw red circles instead of the default line. The markersize parameter makes the points a bit larger than they would have been by default. (Don’t forget to call plt.show() to draw each of these plots.)

plt.plot(x, y, 'ro', markersize=10)

The result of this code is shown in figure 13.2. You can also plot a single point by passing in an x and a y value instead of lists of values. You might think that the coordinates would be enough, but you have to provide a marker symbol such as 'ro' or else it still tries to draw a line. Because one point isn’t enough information to draw a line, you end up with a blank plot.

Figure 13.2. A simple point plot

Because polygons are closed lines, you can draw a hollow polygon exactly the same way as a line. Make sure that the first and last sets of coordinates are the same so that the polygon is closed. For example, the following code snippet adds a 0 to the end of each list so that a line from figure 13.1 is drawn back to the origin. In addition, the lw named parameter is used to change the line thickness (lw is short for linewidth, which you could also use). The results are shown in figure 13.3.

Figure 13.3. A simple closed line plot

x = list(range(10))
y = [i * i for i in x]
x.append(0)
y.append(0)
plt.plot(x, y, lw=5)

Believe it or not, you now know pretty much everything you need to know to make simple plots of vector data, assuming you remember what you learned in earlier chapters. To draw the features in a layer, open it and, for each feature, get the geometry coordinates and plot them as you did here. Let’s try it out with the global landmass shapefile. This particular dataset is convenient because all of the geometries are simple polygons, and you don’t need to worry about multipolygons. You have one donut polygon in the mix, but you can ignore that for now and plot the outer ring. For each feature, get the first ring from its geometry, and then get the coordinates from that. Remember that the coordinates come in a list of pairs, so the zip function comes in handy because you can use it to create two separate lists of x and y coordinates. The following listing demonstrates this pattern and results in a plot like figure 13.4A.

Figure 13.4. Two plots of the continents using closed lines for polygons. Plot A sets the axes equal to each other and the proportions are correct, unlike plot B in which the default axis limits are used.

Listing 13.1. Plotting simple polygons

One little detail that the listing takes care of has not been mentioned yet. For your spatial plots to look right, you need to set the axis units equal to each other. If you comment this line out, you’ll end up with a plot more like figure 13.4B. By default the data are fitted into the available space so that the data fill it all up. The distance covered by a single unit might be different on each axis. If you look closely at part B of the figure, you’ll see that the horizontal axis ranges from -200 to 200 but the vertical one from -100 to 100, and yet they both use up the same amount of space on paper. Setting the axes units equal fixes this distortion.

As you’ve seen, drawing simple polygons isn’t difficult. Dealing with multipolygons and donut polygons adds a little more complexity to the code, but it’s still the exact same process. In the case of a multipolygon, you need to loop through each polygon in the multipolygon, and then for each polygon (whether from a multipolygon or not), loop through the rings and plot each one. The following listing shows this process for the countries shapefile, which gives you a plot like figure 13.5.

Figure 13.5. A plot of countries using closed lines but accounting for multipolygons and holes

Listing 13.2. Plotting polygons

This example breaks things up into a few functions to make things easier. The plot_polygon function loops through the rings in a polygon and plots each one. The other function, plot_layer, opens a data source, gets the layer indicated by the optional layer_index parameter, and loops through all of the features and plots their geometries. If the geometry is a polygon, it passes it along to plot_polygon, but if it’s a multipolygon, it passes each polygon part to plot_polygon separately. Both of these functions allow you to use **kwargs to pass optional parameters that are used by the matplotlib plot function (see the sidebar).

These functions make it easy to plot a shapefile, because all you have to do is pass the filename and symbol to plot_layer, set your axes to be equal, and then show the plot. This listing also shows you how to turn tick marks off if you don’t want them drawing alongside the axes.

Using **kwargs in functions

The same way you’ve used a single asterisk to explode a list into individual values that can be passed as ordered arguments to a function, you can use double asterisks to explode a dictionary for use as named arguments. For example, if a function can accept a variety of optional parameters, you could create a dictionary containing the ones you want to use, with the parameter names as the keys, and then pass that to the function instead of each argument individually. This behavior is useful for passing arguments through your function to another one.

For example, the matplotlib plot function accepts a large number of optional parameters that control the output. It would be nice to use these with the plot_polygon and plot_layer functions in listing 13.2, but those functions have no reason to worry about the optional parameters. They only need to pass them along to plot when the time comes. To do this, add a variable prefixed with ** as the last parameter to your function. This variable is called kwargs by convention, but you can call it whatever you want. It does have to be the last parameter, however. Then you can pass it along to other functions, and the parameters that the user provided eventually arrive in the intended function.

You probably want to plot lines and points in addition to polygons, so create two more simple functions to plot those geometry types and add a few more conditional statements to plot_layer. This additional code is shown in the following listing, and an example of the output is shown in figure 13.6.

Figure 13.6. A plot of countries, rivers, and cities using basic lines and points

Listing 13.3. Plotting lines and points

This listing doesn’t contain new concepts, only new code. You extend the plot_layer function so it calls the correct functions for lines, multilines, points, and multipoints. Then at the end of the listing, you use the updated function to plot country outlines again, but you also add major rivers and large cities. You also take advantage of **kwargs to pass a marker size for the city points so that they don’t draw so big as to hide other features in the plot.

Until now you’ve treated polygons as closed lines when plotting them. What if you want to fill them with a color? You can do this by changing your plot_polygon function to use the matplotlib fill function instead of plot, like this:

def plot_polygon(poly, symbol='w', **kwargs):
    """Plots a polygon using the given symbol."""
    for i in range(poly.GetGeometryCount()):
        x, y = zip(*poly.GetGeometryRef(i).GetPoints())
        plt.fill(x, y, symbol, **kwargs)

Now the symbol parameter should be a color to use for the fill, so using y for yellow would result in figure 13.7 with the continents filled in.

Figure 13.7. A repeat of figure 13.6, but the closed lines are filled with a color

The only problem with this method is that polygons with holes in them will be plotted incorrectly, because the holes will be plotted using the same fill color. You could fix this by only plotting the first ring with the fill color and using white for the later rings, but that wouldn’t create a hole because nothing underneath would show through. If you need real holes, you can use matplotlib PathPatches, but it’s a little more complicated than what you’ve done so far. To draw a polygon, you not only need the vertex coordinates, but also a set of codes denoting whether to draw a line or move the pen to that location. You use this information to create a Path, and then create a PathPatch from that. The PathPatch is the object that you add a fill color to. Once you have that, then you need to add it to the plot. For example, this bit of code draws the solid red triangle shown in figure 13.8:

import matplotlib.pyplot as  plt
from matplotlib.path import Path
import matplotlib.patches as patches
coords = [(0, 0), (0.5, 1), (1, 0), (0, 0)]
codes = [Path.MOVETO, Path.LINETO, Path.LINETO, Path.LINETO]

path = Path(coords, codes)
patch = patches.PathPatch(path, facecolor='red')
plt.axes().add_patch(patch)
plt.show()

Figure 13.8. Simple patch polygons

The first code is MOVETO, meaning that the pen should move to the first set of coordinates without drawing anything. This makes sense if you’ve already drawn something else and don’t want a line connecting the last point in the previous path to the first point in this path. The LINETO code corresponds to the rest of your coordinates, meaning that the points will be connected. Once you’ve created the path, then you can use it to create a patch, which can be filled. You need to add the patch to the drawing area of the plot, which is called the axes (which in turn contains the x and y axis).

To put a hole in a patch, create a path as before, but use a MOVETO code to move to the first set of coordinates for the hole, and then add the vertices in the opposite direction as the outer set in order to indicate that this should create a hole. If the coordinates for the outer ring are in clockwise order, then the coordinates for the holes must be in counterclockwise order. For example, you can put a hole in your earlier triangle like this:

Once you have all of your coordinates and codes in two lists or NumPy arrays, then you can use them as before to create the patch with a hole that is shown in figure 13.8. The following listing applies this process to spatial data to make a plot of world countries like that in figure 13.9.

Figure 13.9. Countries drawn with patches instead of lines

Listing 13.4. Draw world countries as patches

This listing contains a couple of useful functions. The first, order_coords, checks if coordinates are in the order requested and reorders them if not. Most of the code in the function implements an algorithm for determining order. Once the order is determined, it’s compared to the requested order, and if they differ, the coordinates are reversed.

Also, a simple function called make_codes creates a list of LINETO codes of the appropriate length, with the first one changed to MOVETO so a new path can be started.

The last function plots polygons as patches. The first thing this function does is create a list of the outer ring coordinates in clockwise order, along with a corresponding code list. Then it loops through any inner rings that might exist, and for each one creates a list of coordinates in counterclockwise order and a list of codes. Then it appends the coordinates and codes for the inner ring to the end of the master lists. Once all rings have been processed, it creates a patch and adds it to the plot.

The main part of the code simply loops through the features in a shapefile and calls the plot_polygon_patch function on each polygon, including those inside multipolygons. Don’t forget to set the axis to equal before drawing the plot, because otherwise the x and y axis will probably only range from 0 to 1, and you’ll end up staring at a blank plot.

Animation

You can have even more fun by animating your plots. To see how it’s done, you’ll animate the movements of one of the albatrosses from chapter 7. Let’s start by configuring the plot’s extent based on the GPS data:

ds = ogr.Open(r'D:osgeopy-dataGalapagos')
gps_lyr = ds.GetLayerByName('albatross_lambert')
extent = gps_lyr.GetExtent()
fig = plt.figure()
plt.axis('equal')
plt.xlim(extent[0] - 1000, extent[1] + 1000)
plt.ylim(extent[2] - 1000, extent[3] + 1000)
plt.gca().get_xaxis().set_ticks([])
plt.gca().get_yaxis().set_ticks([])

You get the extent of the GPS data layer and then use it to set the x and y limits for the plot, except that you add 1,000 meters in every direction to add a little buffer around the data you want to show. You also turn the tick marks off. You probably want to add the landmasses to your plot because the GPS locations aren’t too interesting without context. You can use your plot_polygon function to do this:

land_lyr = ds.GetLayerByName('land_lambert')
row = next(land_lyr)
geom = row.geometry()
for i in range(geom.GetGeometryCount()):
    plot_polygon(geom.GetGeometryRef(i))

Now you’re ready to add the animated data, but you need to store it somewhere so it’s accessible to the animation routines. You have many ways you could set this up, but for this example you’ll store the x,y coordinate pairs in a list, with the corresponding timestamps in another list:

timestamps, coordinates = [], []
gps_lyr.SetAttributeFilter("tag_id = '2131-2131'")
for row in gps_lyr:
    timestamps.append(row.GetField('timestamp'))
    coordinates.append((row.geometry().GetX(), row.geometry().GetY()))

You iterate through all of the features for the animal with tag '2131-2131' and add the timestamp to one list and a tuple containing the coordinates to another list. You’ll use the coordinates to animate a point and the timestamps to show the current time. You need to initialize both the point and the timestamp annotation, so let’s do that:

point = plt.plot(None, None, 'o')[0]
label = plt.gca().annotate('', (0.25, 0.95), xycoords='axes fraction')
label.set_animated(True)

Here you initialize the point by plotting it with no coordinates. The plot function returns a list of objects, but in this case you have only one item in the list because you only plotted one point. You grab that point graphic out of the list and store it in your point variable. Then you create an empty annotation object on the current axes (gca is short for “get current axes”). Setting the optional xycoords parameter to 'axes fraction' lets you specify the annotation’s location using percentages rather than pixels or map coordinates. The annotation will be a quarter of the way across the axes (0.25) and close to the top (0.95). You also tell the annotation that it’s going to be animated, which will make the text change much more smoothly.

Now you need to write a simple function that tells the animation what items are going to change, namely, your point and label. If you don’t set the point coordinates to None in this function, then there is always a point at the first location in the animation, even while another point is moving around.

def init():
    point.set_data(None, None)
    return point, label

One last function you need to write is the one that moves the point and changes the label. The first parameter to this function is a counter that gets passed to it automatically, specifying which iteration of the animation is currently being processed. The rest of the parameters are up to you. It needs to accept the objects that will change and any data needed to change them. Like the init function, this function must return the objects that change.

def update(i, point, label, timestamps, coordinates):
    label.set_text(timestamps[i])
    point.set_data(coordinates[i][0], coordinates[i][1])
    return point, label

The function uses the counter variable, i, to pull the correct timestamps and coordinates out of the lists. It changes the label’s text to the timestamp, and sets the point’s coordinates to the values you saved from the shapefile. Then it returns the point and the label because they’ve changed.

Now let’s run the animation using the FuncAnimation function in matplotlib. The two required parameters are the matplotlib figure object that the animation will run on and your function that tells things how to animate. The frames parameter is the counter variable, which can be a list of values, or as in this case, the number of times you want the animation to run. The init_func parameter is the initialization function that you wrote. If you don’t provide this, then the first result from the animation will be used for initialization, and it will stay there throughout the animation. If your animation function requires parameters other than the counter, you need to provide them using the fargs argument to FuncAnimation. If blit is True, then only the parts of the plot that have changed will be redrawn, which will speed things up. The interval parameter is the number of milliseconds between frames, and repeat tells it whether to repeat the animation or stop after one time.

import matplotlib.animation as animation
a = animation.FuncAnimation(
    fig, update, frames=len(timestamps), init_func=init,
    fargs=(point, label, timestamps, coordinates),
    interval=25, blit=True, repeat=False)
plt.show()

It would be nice if the animation could be embedded in paper, but it can’t, so you’ll have to run the code yourself to see it in action. One thing you should notice is that nothing in this code will force the elapsed time to stay at a constant speed. If two consecutive GPS fixes are three days apart, they’ll be treated the same as two that are only an hour apart. One way to fix that is to round the timestamps to the nearest hour and make sure entries are in the timestamps and coordinates lists for every hour. If there aren’t coordinates corresponding to a specific time, then put a bogus value in the list. When you update the animation, only update the point location if the coordinates are valid. Here’s a function that rounds timestamps:

from datetime import datetime, timedelta
def round_timestamp(ts, minutes=60):
    ts += timedelta(minutes=minutes/2.0)
    ts -= timedelta(
        minutes=ts.minute % minutes, seconds=ts.second,
        microseconds=ts.microsecond)
    return ts

If you use the default value of 60 for the minutes parameter, the function rounds to the nearest hour. In this case it adds 30 minutes to the timestamp, so if the original was 11:27:14.01, the new time is 11:57:14.01. Then it calculates the remainder of dividing the timestamp’s minutes value by the number of minutes you want to round to. In this case, that value is 57 because 57 goes into 60 zero times and the entire value is the remainder. Then the numbers of seconds and microseconds from the timestamp are added to this value, so you have 57:14.01, and the result is subtracted from the timestamp. Now the timestamp is 11:00 even, which is the closest hour to 11:27:14.01.

Now that you can round timestamps, let’s initialize the timestamps and coordinates lists with the first values from the dataset:

gps_lyr.SetAttributeFilter("tag_id = '2131-2131'")
time_format = '%Y-%m-%d %H:%M:%S.%f'
row = next(gps_lyr)
timestamp = datetime.strptime(row.GetField('timestamp'), time_format)
timestamp = round_timestamp(timestamp)
timestamps = [timestamp]
coordinates = [(row.geometry().GetX(), row.geometry().GetY())]

Now you can loop through the rest of the features and fill in your lists. Get the timestamp for each row and compare it to the last one in the timestamps list. Keep adding new timestamps to the list until the last one is equal to the one from the feature, and while you’re at it, append a bogus set of coordinates to that list, too. The loop will stop when the last timestamp in the list is equal to the row’s timestamp, so you can overwrite the last set of bogus coordinates with the feature’s coordinates and they’ll match up with the correct timestamp.

hour = timedelta(hours=1)
for row in gps_lyr:
    timestamp = datetime.strptime(row.GetField('timestamp'), time_format)
    timestamp = round_timestamp(timestamp)
    while timestamps[-1] < timestamp:
        timestamps.append(timestamps[-1] + hour)
        coordinates.append((None, None))
    coordinates[-1] = (row.geometry().GetX(), row.geometry().GetY())

The only other thing you need to do is change your update function so that it only moves the point if there are valid coordinates. If you don’t do this, the point will disappear when there aren’t coordinates for a specific time because they’ll be set to None.

def update(i, point, label, timestamps, coordinates):
    label.set_text(timestamps[i])
    if coordinates[i][0] is not None:
        point.set_data(coordinates[i][0], coordinates[i][1])
    return point, label

Now you can run the animation as before, but the time increments will be constant, which makes much more sense.

If you have appropriate software installed, you can also save the animation as a video file. For example, I have FFmpeg (www.ffmpeg.org) installed, so as long as ffmpeg is in my PATH environment variable, I can save the animation like this:

a.save('d:/temp/albatross.mp4', 'ffmpeg')

If you don’t have the software to save it yourself but would still like to see the results, there’s a saved version in the Galapagos data folder.

13.1.2. Plotting raster data

You can also use matplotlib to draw raster data. Making a simple raster plot is extremely easy because you have no coordinates to worry about, and there happens to be a function for displaying data contained in a NumPy array. Let’s start with a small image and draw it using the default color ramp, as shown in figure 13.10A.

Figure 13.10. Two plots of the same digital elevation model of Mount St. Helens. Plot A uses the default color ramp (which morphs from blue to red), and plot B uses a grayscale color ramp.

ds = gdal.Open(r'D:osgeopy-dataWashingtondemsthelens_utm.tif')
data = ds.GetRasterBand(1).ReadAsArray()
plt.imshow(data)
plt.show()

As you can see, all you have to do is read the raster data into a NumPy array as you’ve done many times before, and then pass that array to the imshow function, and you have yourself a plot. You might not like the default color ramp, but you can probably find a built-in one that you like. If not, you can create your own, although you won’t learn how to do that here. To use a colormap, pass its name to imshow as the cmap parameter, like this (figure 13.10B):

plt.imshow(data, cmap='gray')

Tip

As of this writing, you can see a list of matplotlib colormaps at http://wiki.scipy.org/Cookbook/Matplotlib/Show_colormaps.

If you want to plot a large image, you shouldn’t read the entire band in and try to plot it. You’re much better off using one of the pyramid layers because they take up much less memory and will plot considerably faster. You need to choose the appropriate overview level so that you have the resolution that you need without degrading performance. Here’s a function that retrieves overview data from an image, although it doesn’t check to make sure that the user requests a valid overview level.

Listing 13.5. Function to retrieve overview data

def get_overview_data(fn, band_index=1, level=-1):
    """Returns an array containing data from an overview.

    fn         - path to raster file
    band_index - band number to get overview for
    level      - overview level, where 1 is the highest resolution;
                 the coarsest can be retrieved with -1
    """
    ds = gdal.Open(fn)
    band = ds.GetRasterBand(band_index)

    if level > 0:
        ov_band = band.GetOverview(level)
    else:
        num_ov = band.GetOverviewCount()
        ov_band = band.GetOverview(num_ov + level)
    return ov_band.ReadAsArray()

The function requires that the user provide the path to a raster file, and optionally, a band number and overview level. If the optional parameters aren’t provided, it will return the coarsest overview for the first band. Try using this function to plot the lowest resolution overview for a Landsat band:

As you can see from figure 13.11A, this results in an extremely dark image and in this case, at least, it’s difficult if not impossible to differentiate much at all. It might even seem worse if you hadn’t masked out the pixels that were equal to 0. Without that step, you’d see a rectangle with all of the outside pixels that weren’t part of the satellite imagery drawn as black.

Figure 13.11. Two plots of the same Landsat band. Plot A uses default settings, but plot B uses stretched data for much better contrast.

Because of the lack of contrast in figure 13.11A, this is a perfect time to stretch the data to make it look better. A standard deviation stretch, which is a common method, keeps pixel values that are within one or more standard deviations (usually two) from the mean, and sets everything outside that range to the minimum or maximum included values, as shown in figure 13.12. The values are then stretched between 0 and 1 for drawing, because that’s what matplotlib wants.

Figure 13.12. An illustration of how the data extremes are clipped, and then all data values are stretched between 0 and 1

To implement this, figure out the minimum and maximum cutoffs that are the desired number of standard deviations from the mean and then pass them as the vmin and vmax parameters to imshow, respectively. The data will automatically be stretched for you, but you need to provide these clip values, like this:

mean = np.mean(data)
std_range = np.std(data) * 2
plt.imshow(data, cmap='gray', vmin=mean-std_range, vmax=mean+std_range)

Figure 13.11B is stretched in this way, and it’s obviously a better visualization of the data than the nonstretched version.

You can also plot three bands as red, green, and blue, with an optional fourth alpha band. In this case you need to stack the bands into a three-dimensional array and pass that to imshow. Unlike with single bands, using masked arrays to filter out the zeros around the edges doesn’t work in this case, so you’re stuck with the black edges for the moment. The following code snippet uses three bands to create a figure like 13.13A:

os.chdir(r'D:osgeopy-dataLandsatWashington')
red_fn = 'p047r027_7t20000730_z10_nn30.tif'
green_fn = 'p047r027_7t20000730_z10_nn20.tif'
blue_fn = 'p047r027_7t20000730_z10_nn10.tif'
red_data = get_overview_data(red_fn)
green_data = get_overview_data(green_fn)
blue_data = get_overview_data(blue_fn)
data = np.dstack((red_data, green_data, blue_data))
plt.imshow(data)

Again, that image is too dark to be useful. Unfortunately, stretching the data is a bit more complicated if you’re plotting multiple bands because the automatic scaling with vmin and vmax only works for single bands. You’ll need to normalize the data yourself. The following function performs a standard deviation stretch on the data contained in a NumPy array and then scales the results between 0 and 1.

Listing 13.6. Function to stretch and scale data

def stretch_data(data, num_stddev):
    """Returns the data with a standard deviation stretch applied.

    data       - array containing data to stretch
    num_stddev - number of standard deviations to use
    """
    mean = np.mean(data)
    std_range = np.std(data) * 2
    new_min = max(mean - std_range, np.min(data))
    new_max = min(mean + std_range, np.max(data))
    clipped_data = np.clip(data, new_min, new_max)
    return clipped_data / (new_max - new_min)

Instead of finding the appropriate distance from the mean, based on the desired number of standard deviations, this function makes sure that the values used aren’t less than the minimum or greater than the maximum data values. For example, if you have 8-bit data that ranges from 0 to 255, the mean value is 43, and the standard deviation is 24, then the lower bound would be -5 if you subtracted two standard deviations from the mean. The minimum possible value is 0, however, and you don’t want to normalize your data using impossible values, so that’s why the function checks to make sure that the bounds don’t fall out of the range of potential values. After determining the bounds, they’re used with the np.clip function, which replaces all values that are less than new_min with new_min, and replaces all values that are greater than new_max with new_max, like what was illustrated back in figure 13.12. Then the resulting data are scaled from 0 to 1. Now you can use this function to scale each of the three bands appropriately.

Because you’re scaling these data yourself, you can take advantage of the alpha channel to get rid of the black around the edges. For this particular image, you can assume that if all three bands contain 0, then the pixel is an outside edge. The alpha band should also contain 0 for these pixels, meaning it’s fully transparent. Other pixels should have a 1 in the alpha band so that they’ll be drawn at full opacity. Add this alpha band to your three-dimensional stack, as shown in the following snippet, and when you plot it the results will be similar to figure 13.13B.

Figure 13.13. Two plots of the same three-band Landsat image. Plot A uses default settings, but plot B uses stretched data for considerably better contrast.

red_data = stretch_data(get_overview_data(red_fn), 2)
green_data = stretch_data(get_overview_data(green_fn), 2)
blue_data = stretch_data(get_overview_data(blue_fn), 2)
alpha = np.where(red_data + green_data + blue_data > 0, 1, 0)
data = np.dstack((red_data, green_data, blue_data, alpha))
plt.imshow(data)

13.1.3. Plotting 3D data

You can even plot three-dimensional data, such as a digital elevation model. To do this, you need the array containing elevation data, and two other arrays of the same size containing x and y coordinates for each pixel. These latter two arrays can be created by passing arrays containing the possible x and y values to np.meshgrid, which results in data like that shown in figure 13.14. Each pixel in the x array contains a value indicating which row it’s in, and each pixel in the y array indicates the column. If your pixels are square and you don’t need georeferencing information in your plot, you can use arange to get the input lists for meshgrid, so getting your two-dimensional x and y arrays is as easy as this:

x, y = np.meshgrid(np.arange(band.XSize), np.arange(band.YSize))

Figure 13.14. An illustration of meshgrid output. Part A shows the x,y coordinate pair for each cell in the array. The output is two arrays, one of which contains x coordinates (part B) and the other contains y coordinates (part C).

In other cases, you can use the geotransform to compute the required information so that the x and y arrays contain real-world coordinates instead of pixel coordinates like those in figure 13.14. The following listing shows the steps to do this using a DEM of Mount St. Helens, and then it plots the data in 3D to get figure 13.15A.

Figure 13.15. 3D plots of Mount St. Helens. Plot A uses default settings, while the elevation and azimuth have been changed for plot B, as well as the axis removed.

Listing 13.7. Using `meshgrid` to get map coordinates

The first part of this listing reads overview data into memory and uses the geotransform to calculate the bounding coordinates for the DEM. These coordinates are then used in conjunction with meshgrid to create the x and y arrays needed for the plot.

To create the plot, you first create a matplotlib figure object and then grab its axes object. You tell the axes to use 3D, and then you call its plot_surface method in order to make the plot. This function requires the x and y arrays and the array containing elevations. You use the colormap named gist_earth instead of the default, and you used lw=0 to set the line width to 0. If you don’t change the line width, then each cell will have an outline around it, which doesn’t look good in this case. By the way, a figure and axes were created automatically for your earlier plots, but you didn’t need to worry about them. Here you do, because you need a handle to the axes to specify 3D and plot the surface.

What if you want to change the vantage point that you’re viewing the 3D image from? Well, you can set an elevation between 0 and 90, where 0 is ground level and 90 is looking straight down, and you can also rotate the plot from 0 to 360 degrees. The image in figure 13.15B was obtained by setting the elevation to 55, rotating the figure 60 degrees, and turning the axis off. To do this, add these two lines before calling plt.show():

ax.view_init(elev=55, azim=60)
plt.axis('off')

You can make this even more fun by creating an animation. This is simpler than the Albatross animation from earlier because all you have to do is change the rotation factor for each iteration. Try adding this to your code before calling plt.show:

import matplotlib.animation as animation

def animate(i):
    ax.view_init(elev=65, azim=i)

anim = animation.FuncAnimation(
    fig, animate, frames=range(0, 360, 10), interval=100)

The animate function changes the vantage point that the plot is being viewed from. The call to FuncAnimation sets things up so that the animate function is called 36 times, once for each value in frames. This will cause the plot to rotate 10 degrees each time. Although the interval parameter specifies that there will be 100 milliseconds in between each frame, it will be slower if your computer can’t draw it that fast. A saved version of this is in the Nepal data folder.

13.2. Mapnik

The plots you’ve been making so far work well for visualizing data, but a good chance exists that you’ll need to make something that looks a little nicer, or more like a real map, at some point. One good way to do this using Python is with Mapnik, a popular cartographic library. In fact, you might have seen maps created with Mapnik without knowing it. Mapnik was designed for making tiled maps for web applications, and as far as I know it’s not easy to put cartographic symbols such as North arrows on Mapnik images. You can do it with other graphics modules such as Cairo, but that’s beyond the scope of this introduction. This section will walk you through the basics of drawing vector and raster data using this module, but you should visit mapnik.org if you want to learn more.

Before we start drawing anything, though, let’s take a quick look at the minimum requirements for a Mapnik map, as seen in figure 13.16. A map has one or more layers, as well as one or more styles. The styles are what specify how the data are to be drawn. Each style needs at least one rule, and each rule needs at least one symbol. Rules can also have filters so that they only apply to a subset of the data. Each layer needs a data source and at least one style. Layer styles aren’t new style objects; they reference one of the styles that belongs to the map. You’ll see how this all works in the next few examples.

Figure 13.16. A basic organization chart of a Mapnik map. Each map has at least one layer and one style. Each layer needs to reference at least one of the styles.

13.2.1. Drawing vector data

Do you remember the New Orleans data from an earlier chapter? If not, you’re about to be reminded, because you’ll use it in the next few examples. The following listing starts by drawing the TIGER water layer from the US Census Bureau.

Listing 13.8. Creating a simple Mapnik map

The first step is to create a Mapnik map object, but you call it m instead of map because map is a reserved word in Python. You need to provide a size for the map when you create it, so this map will be 800 pixels wide and 600 tall. You can optionally provide a spatial reference in the form of a Proj.4 string or EPSG code; if you don’t provide this, then it will default to WGS84 lat/lon. Because most of the New Orleans data uses NAD83 lat/lon, that’s what you decide to use here. You also set a bounding box in the form of (min_x, min_y, max_x, max_y). If you don’t set the bounding box, you’ll end up with an empty map.

To add a layer to a map, you need to create a layer object and give it a data source. Several types of data sources exist for different data formats, such as shapefiles, Geo-JSON, and PostGIS. Here you create a shapefile data source and add it to a layer that you name 'Tiger'.

Adding a layer to a map isn’t enough, however. If you want the layer to be drawn, you also need to provide information about how to symbolize it. You start this off by creating a Mapnik color object (water_color) from RGB values that specify a light blue, and then used that to create a polygon symbolizer for drawing water layers. Polygons drawn with this symbolizer will be filled with the blue color defined by the RGB values.

Once you have a symbolizer, you create a symbology style. A style needs at least one rule that defines how to draw something. This particular style is simple and only contains one rule, which in turn only contains your polygon symbolizer. Then you add the style to the map so that layers could use it. Notice that you provide a name for the style at the same time you add it to the map; this is important later.

You want the Tiger layer to use the style you create, so then you add the style to the layer as well, making sure to use the same name for the style that you used when adding it to the map. The style must be added to both the layer and the map or it won’t work. In addition, the style must be added to the layer before the layer is added to the map, which is what you do next.

Finally, after everything is added in the appropriate places, you save the map to a file. If all goes well, you’ll have an image like figure 13.17.

Figure 13.17. A simple plot of hydrographic data using a single layer and style rule

As pretty as that figure is, you want more than water bodies, so now try adding marshland, too. These data come from a national hydrography dataset that includes open water, glaciers, marshes, dry lakes, canals, and other features. In fact, including the canals and lakes from this dataset will make your map look a little better, so you’ll include them as well. This next listing shows how to add this new layer with more-complicated styling to the map. This code would be added in before saving the map to an image file.

Listing 13.9. Using multiple rules in a style

The methods for creating this layer and adding it and its style to the map are the same as before, but this time the style is more complicated. For starters, you add two rules to this style instead of one. Let’s look at the first of these, called water_rule. You use a filter to apply this rule to only those features where the "Feature" attribute column is equal to either 'Canal' or 'Lake'. Filter expressions in Mapnik are similar to the OGR filter expressions that you’ve already used, but attribute names must be surrounded by brackets. You use the same water polygon symbolizer for this rule that you used for the tiger data.

Before creating the second rule, you construct new symbolizers that use a green color. This time you define the color using hex notation to prove that you can, but you could use RGB again if you want. The color is then used to create another polygon fill symbol and also a line symbol that’s 2 pixels wide. This line symbolizer will be used to outline the polygons with the same color that they’re filled with. The reason you use the outline here is because the datasets have slight gaps between shapes that are obvious without an outline filling them up.

Now that you have your symbolizers, you create the marsh rule for this layer. First, you use a filter to make this rule apply only to features where the "Feature" attribute column is equal to the string 'Swamp or Marsh'. Then you add the green fill and outline symbols that you created previously.

After creating the rules, you create a new style and add both rules to it. Then you add the style to the map and the layer, and add the layer to the map. After rendering this map to a file, you end up with a graphic like figure 13.18.

Figure 13.18. Another layer added, this time using two rules to specify that marshes and open water in the same dataset are drawn differently

If you compare figures 13.17 and 13.18, you might wonder where all of the little water bodies disappeared to. The layers are drawn in the same order that you add them to the map, so the marshes were drawn on top of those little water bodies. For this reason, you need to think about which of your layers should not be covered up and plan accordingly. To get a graphic like figure 13.19 instead, move the code that appends the layers to the map down to the end of the script and then reverse the order that the layers are added, like this:

m.layers.append(atlas_lyr)
m.layers.append(tiger_lyr)

Figure 13.19. The same data as figure 13.18, but the order of the layers reversed

Your map is still not complete, however, because you want some roads and the New Orleans city boundary. The following listing shows the code to add these.

Listing 13.10. Adding the roads and city outline

Only a few new things were added in this example. This first is that you specify the spatial reference when creating the roads layer. This is necessary because this particular shapefile uses WGS84 instead of NAD83. You could use a Proj.4 string, as you did with the map spatial reference information, but you opt for an EPSG code instead. Notice that you use two rules for the roads style so that you can draw primary roads a little fatter than secondary and tertiary roads.

The second new concept is that you can create color objects using HTML named colors, as well. This is the technique you use to create the black line for the city outline. But you also want the city outline to be dashed instead of solid, so you edit the line’s stroke property to make it dashed. The first parameter to add_dash is the length of the dash in pixels, and the second is the length of the gap between the dashes.

The result of adding all of this code to your script is shown in figure 13.20.

Figure 13.20. Line styles added in order to draw roads and the city outline

13.2.2. Storing information as XML

If you use certain styles or layers often, you can store the relevant information in XML files that can be loaded from your script. You can also store entire maps this way, meaning that you can create a map using XML and then render it with Mapnik. If you’d like to see what one of these files looks like, add this line of code to the end of your script:

mapnik.save_map(m, 'nola_map.xml')

To render the map described in this XML file to an image, write a script that imports Mapnik and then loads the XML and saves the output like this:

m = mapnik.Map(400, 300)
m.zoom_to_box(mapnik.Box2d(-90.3, 29.7, -89.5, 30.3))
mapnik.load_map(m, r'd:	emp
ola.xml')
mapnik.render_to_file(m, r'd:	emp
ola.png')

That is pretty much the entire script. You do still have to create a map object with the desired size and bounding box, but the layers and styles are pulled from the XML file.

You aren’t stuck using only the information contained in the XML, however, so you can use this technique to store commonly used layers or styles. For example, if you use the hydrography dataset from the National Atlas often, you can store its information in an XML file and load it in your scripts. Pull the code pertaining to the atlas layer out of your earlier script and use it to create a new script that saves the necessary XML. The following listing shows what you need.

Listing 13.11. Create XML to describe the National Atlas hydrography layer

import mapnik

m = mapnik.Map(0, 0)

water_rule = mapnik.Rule()
water_rule.filter = mapnik.Expression(
    "[Feature]='Canal' or [Feature]='Lake'")
water_rule.symbols.append(
    mapnik.PolygonSymbolizer(mapnik.Color(165, 191, 221)))

marsh_rule = mapnik.Rule()
marsh_rule.filter = mapnik.Expression("[Feature]='Swamp or Marsh'")
marsh_color = mapnik.Color('#66AA66')
marsh_rule.symbols.append(mapnik.PolygonSymbolizer(marsh_color))
marsh_rule.symbols.append(mapnik.LineSymbolizer(marsh_color, 2))

atlas_style = mapnik.Style()
atlas_style.rules.append(water_rule)
atlas_style.rules.append(marsh_rule)
m.append_style('atlas', atlas_style)

lyr = mapnik.Layer('National Atlas Hydro',
                   "+proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs")
lyr.datasource = mapnik.Shapefile(file=r'D:osgeopy-dataUSwtrbdyp010')
lyr.styles.append('atlas')
m.layers.append(lyr)

mapnik.save_map(m, r'd:	emp
ational_atlas_hydro.xml')

This script creates the styles used by the National Atlas layer, including the filters that are specific to that layer’s attribute table. It also creates the layer and appends the style to it. The SRS is added to the layer, too, because your scripts that load this file may not use the same SRS as this particular layer. The style and layer are both added to a dummy map object that’s used to save the information. The size of the map doesn’t matter because that will be determined by the script that loads the XML.

The resulting XML looks like the following listing.

Listing 13.12. XML describing the National Atlas hydrography layer

<?xml version="1.0" encoding="utf-8"?>
<Map srs="+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs">
    <Style name="atlas">
        <Rule>
            <Filter>
                (([Feature]=&apos;Canal&apos;) or
                 ([Feature]=&apos;Lake&apos;))
            </Filter>
            <PolygonSymbolizer fill="rgb(165,191,221)"/>
        </Rule>
        <Rule>
            <Filter>([Feature]=&apos;Swamp or Marsh&apos;)</Filter>
            <PolygonSymbolizer fill="rgb(102,170,102)"/>
            <LineSymbolizer stroke="rgb(102,170,102)" stroke-width="2"/>
        </Rule>
    </Style>
    <Layer name="National Atlas Hydro"
           srs="+proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs">
        <StyleName>atlas</StyleName>
        <Datasource>
            <Parameter name="file">D:osgeopy-dataUSwtrbdyp010</Parameter>
            <Parameter name="type">shape</Parameter>
        </Datasource>
    </Layer>
</Map>

As you can see, the XML is straightforward, so you might even want to define your layers this way from the beginning instead of writing code. Either way, once you have this file, you can delete all of the code from listing 13.9 that creates the atlas layer and style (that’s more than 20 lines) and then replace this

m.layers.append(atlas_lyr)

with this:

mapnik.load_map(m, r'd:	emp
ational_atlas_hydro.xml')

Obviously, this technique will simplify your life if you use the same layers in multiple maps and is worth looking into.

13.2.3. Drawing raster data

Now that you know the basics of drawing vector data with Mapnik, it’s time to create a simple graphic using raster data. The following listing creates an image that displays a topo map for Mount St. Helens.

Listing 13.13. Drawing a raster

Much of this example should look familiar, because it’s similar to the vector example. The main differences are that you use a GDAL data source instead of a shapefile and you use a simple raster symbolizer with no options. Unlike the shapefile examples, though, you do have to specify an SRS for the raster data source even if it matches the map’s SRS. Other than that, the process of creating rules, styles, and layers is still the same. The output graphic looks like figure 13.21.

Figure 13.21. A raster plot of a topo map

This image could use a little help, though. One common technique for making something like this more aesthetically pleasing is to overlay it on a hillshade dataset to give it depth. A hillshade is created by assuming a height and angle for a light source, and determining where the shadows would fall based on a digital elevation model (figure 13.22). The next listing shows how to put a hillshade derived from the Mount St. Helens DEM underneath this topo map to get a figure like 13.23.

Figure 13.22. A digital elevation model of Mount St. Helens on the left, and a hillshade derived from the DEM on the right

Listing 13.14. Using a hillshade

In this example, the hillshade layer is added exactly the same way as the topo layer was added previously, but this time you make one change to the topo layer’s symbolizer. Because you want the topo layer to be semitransparent to let the hillshade layer show through, you change the opacity property to a value of 0.6. A value of 1.0 (the default) makes the layer fully opaque, so the hillshade layer might as well not even be there. A value of 0 is fully transparent, so you’d only see the hillshade. You can play with this value to see what level of transparency you like best, but figure 13.23 shows what effect a value of 0.6 has.

Figure 13.23. A topo raster drawn partly transparent so that an underlying hillshade layer provides shadows

13.3. Summary

The matplotlib module is a general-purpose plotting module for Python and works well for quickly visualizing data.
You can use the matplotlib interactive mode to see immediately what effect something has.
Use the Mapnik module if you want prettier maps and images than what you can easily get with matplotlib.
You can store Mapnik styles and layers in XML files to make them easily reusable.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13. Visualizing data

Create new playlist

Sign In

Sign Up

Chapter 13. Visualizing data

13.1. Matplotlib

13.1.1. Plotting vector data

Figure 13.1. A simple line plot

Figure 13.2. A simple point plot

Figure 13.3. A simple closed line plot

Figure 13.4. Two plots of the continents using closed lines for polygons. Plot A sets the axes equal to each other and the proportions are correct, unlike plot B in which the default axis limits are used.

Listing 13.1. Plotting simple polygons

Figure 13.5. A plot of countries using closed lines but accounting for multipolygons and holes

Listing 13.2. Plotting polygons

Figure 13.6. A plot of countries, rivers, and cities using basic lines and points

Listing 13.3. Plotting lines and points

Figure 13.7. A repeat of figure 13.6, but the closed lines are filled with a color

Figure 13.8. Simple patch polygons

Figure 13.9. Countries drawn with patches instead of lines

Listing 13.4. Draw world countries as patches

Animation

13.1.2. Plotting raster data

Figure 13.10. Two plots of the same digital elevation model of Mount St. Helens. Plot A uses the default color ramp (which morphs from blue to red), and plot B uses a grayscale color ramp.

Tip

Listing 13.5. Function to retrieve overview data

Figure 13.11. Two plots of the same Landsat band. Plot A uses default settings, but plot B uses stretched data for much better contrast.

Figure 13.12. An illustration of how the data extremes are clipped, and then all data values are stretched between 0 and 1

Listing 13.6. Function to stretch and scale data

Figure 13.13. Two plots of the same three-band Landsat image. Plot A uses default settings, but plot B uses stretched data for considerably better contrast.

13.1.3. Plotting 3D data

Figure 13.14. An illustration of meshgrid output. Part A shows the x,y coordinate pair for each cell in the array. The output is two arrays, one of which contains x coordinates (part B) and the other contains y coordinates (part C).

Figure 13.15. 3D plots of Mount St. Helens. Plot A uses default settings, while the elevation and azimuth have been changed for plot B, as well as the axis removed.

Listing 13.7. Using meshgrid to get map coordinates

13.2. Mapnik

Figure 13.16. A basic organization chart of a Mapnik map. Each map has at least one layer and one style. Each layer needs to reference at least one of the styles.

13.2.1. Drawing vector data

Listing 13.8. Creating a simple Mapnik map

Figure 13.17. A simple plot of hydrographic data using a single layer and style rule

Listing 13.9. Using multiple rules in a style

Figure 13.18. Another layer added, this time using two rules to specify that marshes and open water in the same dataset are drawn differently

Figure 13.19. The same data as figure 13.18, but the order of the layers reversed

Listing 13.10. Adding the roads and city outline

Figure 13.20. Line styles added in order to draw roads and the city outline

13.2.2. Storing information as XML

Listing 13.11. Create XML to describe the National Atlas hydrography layer

Listing 13.12. XML describing the National Atlas hydrography layer

13.2.3. Drawing raster data

Listing 13.13. Drawing a raster

Figure 13.21. A raster plot of a topo map

Figure 13.22. A digital elevation model of Mount St. Helens on the left, and a hillshade derived from the DEM on the right

Listing 13.14. Using a hillshade

Figure 13.23. A topo raster drawn partly transparent so that an underlying hillshade layer provides shadows

13.3. Summary

Table of Contents for
Chapter 13. Visualizing data

Listing 13.7. Using `meshgrid` to get map coordinates