Chapter 9. Advanced Visualization of Spatial Data

Visualization of spatial data is vital both during intermediate analysis steps (to examine preliminary results and make sure we are on the right track) and as the final product (to present our results to colleagues or in a publication). In this chapter, we are going to visualize various datasets we created in the previous chapters, bringing closure to the previously presented case studies. While doing this, you will learn how to produce an elaborate and customized graphical output in R.

Most of this chapter is going to concentrate on the popular graphical package ggplot2. We will begin by presenting the logic behind the special syntax this package follows. Afterwards, we will review, through examples, the way spatial and nonspatial plots can be produced and customized using this package. The ggmap package, which automates downloading static maps from the Web and can be used to complement plots produced with ggplot2, will then be presented. Finally, we will experiment a little bit with three-dimensional (3D) visualization using the lattice package.

In this chapter, we'll cover the following topics:

  • Using ggplot2 to produce publication-quality plots
  • Using ggmap to add static maps as the background
  • Using lattice to produce 3D plots

Note

In this chapter, we are going to use objects created in the previous chapters.

Plotting with ggplot2 and ggmap

In this section, you are going to learn how to use ggplot2 and ggmap to visualize spatial data. The section is structured as follows. First, we will review the ggplot2 framework using a simple example of a time series plot since (as you will see right away) the syntax is quite different from that of other plotting methods we used until now. After that, we will practice a little bit more with producing ordinary, nonspatial plots. Next, we will see how the ggplot2 plots can be saved for subsequent use, both within the R environment and in external files. Finally, the last two sections will deal with the most important material from this book's perspective. In these two sections, we will see how spatial data can be incorporated into ggplot2 visualizations to produce maps, and how we can conveniently download reference background images for such maps using ggmap.

Before going into the details of ggplot2, it is important to state that to use this package, our input data needs to be contained in a data.frame object and it needs to be tidy (see Chapter 7, Combining Vector and Raster Datasets, for more information). Therefore, in practice, there are two steps when plotting with ggplot2:

  • Bringing the data to the appropriate form
  • Creating the plot

In some of the following examples, when the data is already in the right shape, we will skip the first step and move on to creating the plot right away. Otherwise, we will first have to reshape our data (using methods already familiar to us from the previous chapters) and only then utilize ggplot2 to create the plot itself.

An overview of ggplot2

The best way to understand the underlying logic of ggplot2 is through examples. Our first example is going to reproduce the time series plot we created in Chapter 2, Working with Vectors and Time Series, when demonstrating the difference between the three graphics systems in R. As you remember, we used the following data.frame object (although not knowing that it was a data.frame object at the time) that represents a time series of temperature measurements. The time column of this data.frame object contains dates, while the tmax column contains the daily temperature maxima:

> head(dat)
        time tmax
1 2006-01-01 13.3
2 2006-01-02 14.4
3 2006-01-03 15.6
4 2006-01-04 14.4
5 2006-01-05 10.6
6 2006-01-06 12.8

We used the following expression to produce the simplest possible line plot of this time series with ggplot2:

> library(ggplot2)
> ggplot(dat, aes(x = time, y = tmax)) + 
+ geom_line()

Although the following screenshot already appeared, in Chapter 2, Working with Vectors and Time Series, it is provided again here since we are going to discuss the example for some time now:

An overview of ggplot2

We will now briefly go over the main characteristics of the ggplot2 syntax, referring to the previous example. Later in this chapter, we will produce additional plots using other datasets from the previous chapters. This way, by the end of this chapter, we'll have reviewed the most important concepts and methods of operation for visualization of spatial data with ggplot2.

Note

The ggplot2 package is extremely flexible. Due the abundance of functions and usage options, the variety of plots it is capable of producing is mainly limited by the users' knowledge and expertise. However, this abundance can be overwhelming for beginners. In addition, the ggplot2 syntax is hard to grasp at first, both conceptually and practically. Nevertheless, those who overcome the initial difficulties are greatly rewarded.

The purpose of this chapter is to present, through examples, some of the most important points to note when using ggplot2 to display spatial data. Obviously, we cannot cover all plot types and their optional modifications (even a standalone book cannot accomplish such a task). What we can do is provide the initial knowledge sufficient for orientation. The subsequent usage of ggplot2 will inevitably require trial and error, as well as looking for help online. In addition, the following sources of information on ggplot2 are highly recommended:

  • The ggplot2 package is fortunately accompanied by a highly comprehensive collection of help pages (http://docs.ggplot2.org/), where all functions and arguments are reviewed—in most cases with code and graphical output examples.
  • Winston Chang's book R Graphics Cookbook, O'Reilly Media, which was published in 2012, is notable for providing an extensive coverage in the form of precise recipes for visualization using ggplot2 (although there are relatively few examples involving spatial data).
  • Readers interested in more information on the theoretical background of ggplot2 can refer to the absorbing book by the package author Wickham, H. ggplot2: Elegant Graphics for Data Analysis, Springer, 2009. Some of the code sections in that book are slightly outdated (the package has evolved since the book was published), but this does not affects the book's utility.

Each expression used to create a plot with ggplot2 is made up of several components as we shall see shortly. It starts with the ggplot function call (such as ggplot(dat,aes(x=time,y=tmax))), followed by additional layers' definitions and settings.

At the core of each plot created with ggplot2 are the layers, with each layer necessarily associated with a given geometry and the data used to draw the layer. A plot with no layers cannot be produced since there will be nothing to show:

> ggplot(dat, aes(x = time, y = tmax))
Error: No layers in plot

Layers are added to a plot using the + operator, and the data used to create the layer, as well as the aesthetic mapping (the link between the data and layers' aesthetic appearance), are specified as arguments either in the ggplot function (setting them as global arguments for this particular plot) or in the respective layer function (setting them as local arguments for the particular layer only).

The most straightforward way to create a layer is to use the layer function, where we can (rather verbosely) specify all of the layer characteristics. The preceding plot, for instance, can also be produced with the following expression, which demonstrates that geom_line is a layer with a predefined "line" geometry and "identity" statistical transformation (this means that there is no statistical transformation; the values are taken as is), among other default definitions that remain hidden (such as colour="black" for line color):

> ggplot(data = dat, aes(x = time, y = tmax)) + 
+ layer(geom = "line", stat = "identity")

However, in practice, layers are most often specified using predefined layer functions rather than the layer function. These functions' names start with geom (such as geom_line) or stat (such as stat_contour). There is no conceptual difference between these two types of functions; the difference is just in their defaults—the latter emphasizes statistical transformations while the former does not; therefore, each type of function may be easier to use for a given purpose. However, it should be remembered that the settings of any given layer can be overridden, so in many cases the geom and stat layers are redundant in terms of the desired result. For example, exactly the same layer of contours can be produced with either geom_contour or stat_contour.

In general, each layer is composed of the following five components:

  • The geometry the layer follows (this is already encompassed in the layer function name, for example, geom_line)
  • The data.frame object where the data come from (for example, dat)
  • The definition of the link between the data and layer appearance, that is, aesthetic mapping inside the aes function (for example, aes(x=time,y=tmax)) or the definition of appearance unrelated to the data, that is, aesthetic setting outside the aes function
  • The statistical transformation that manipulates the data prior to aesthetic mapping (we will not use this here)
  • A position adjustment to deal with overlapping graphical objects (we will not use this here)

A function to create a layer (such as geom_line) already encompasses defaults for the geom (geometry) and stat (statistical transformation) parameters; it is rarely necessary to override these in practice. For our purposes in this chapter, we will also not have to deal with position adjustments; these are most useful in nonspatial plots such as histograms and boxplots. Therefore, there are just two parameters we will usually modify in the ggplot2 layers—the data and aesthetic mappings (and, optionally, aesthetic settings). For example, the previous plot of tmax as a function of time has only one layer and the geometry type of that layer is "line". The function used to create the layer is named geom_line, accordingly. The data.frame object from where the time and tmax values come is dat, with time mapped to x (the x axis aesthetic) and tmax mapped to y (the y axis aesthetic).

The following table lists the specific layer functions we will use in this chapter in order to produce several common types of layers. The table also shows the set of required and optional parameters of each function. The required parameters (or aesthetics) of a given layer control its geometry and so, a layer cannot be drawn without them. For example, the geom_line function requires a set of x and y coordinates since no line can be drawn without these. As we shall see, in practice the required parameters are always mapped to variables in our data (in the form of an assignment within an aes function call). On the other hand, the optional parameters can either be mapped to variables in our data, set at constant values or left unspecified (in which case, they are set to their default constant values, such as colour="black" for geom_line, giving the line its default black color).

Function name

Required parameters

Optional parameters

geom_line

x, y

alpha, colour, linetype, size

geom_histogram

x

alpha, colour, fill, linetype, size, weight

geom_point

x, y

alpha, colour, fill, shape, size

geom_text

label, x, y

alpha, angle, colour, family, fontface, hjust, lineheight, size, vjust

geom_path

x, y

alpha, colour, linetype, size

geom_contour

x, y

alpha, colour, linetype, size, weight

geom_density2d

x, y

alpha, colour, linetype, size

geom_polygon

x, y

alpha, colour, fill, linetype, size

geom_raster

x, y

alpha, fill

There are, at the time of writing, 37 geom and 21 stat functions (visit http://docs.ggplot2.org/ for the complete list). In this chapter, we will limit ourselves to just these nine geom functions, which are highly relevant with respect to spatial data.

Most of the names of the geometries are self-explanatory. For example, geom_line is used to draw lines, geom_histogram is used to draw histograms, and geom_raster is used to draw rasters. The geom_contour and geom_density2d functions are used to create contours based on raster values or point density, respectively, as we shall see later in this chapter. The difference between geom_line and geom_path requires clarification. While both functions are used to create a line layer, the series of (x,y) points is connected according to their order along the x axis in geom_line. On the other hand, in geom_path, the points are connected according to their original order in the source data.frame object. Therefore, the first is useful when plotting time series or mathematical functions (such as the temperature time series), while the second is useful to plot spatial line layers (such as a GPS track).

In addition to layer functions and their settings, which we briefly reviewed, three other types of components are used to control plot appearances in ggplot2:

  • scales: Used to control the conversion, or mapping, between the data values and aesthetics we see on the screen (these functions' names start with scale, for example, scale_x_date)
  • faceting: Used to produce multiple plots of different data subsets, side by side (these start with facet, for example, facet_grid)
  • themes: Used to modify the general appearance of the plot (these start with theme, for example, theme_bw)

How are all of these components specified in practice? The first component, as we have already seen, is always a ggplot function call that initializes a ggplot object used to store all the necessary information to produce a plot. It usually takes the ggplot(data,aes(...)) form, where data is the default data.frame object to be taken into account when plotting, and aes(...) is the default aesthetic mapping to be used. These are passed to all of the other layers, unless the respective layer definition specifies its own data and/or aesthetic mapping, in which case they override those within ggplot(). Next, the layers, scales, faceting definitions, and themes are added with each two components separated by a + symbol. In our last example, the ggplot(dat,aes(x=time,y=tmax)) part specified that the default dataset is dat, and the default aesthetic mapping is to plot time on the x axis and tmax on the y axis. The geom_line() part then added a line layer; no arguments were specified, therefore the layer used the default dataset and the default aesthetic mapping. Scales, faceting, and theme settings were also left unspecified.

It is important to understand that in order to produce a minimal plot, we need to specify only a dataset, a single layer, and mappings for the layers' required parameters (such as x and y in geom_line). Other layer parameters (such as colour or size), scales, faceting, and themes are optional. For example, the following expression adds a colour setting and two scale definitions, but it produces exactly the same plot as shown in the preceding screenshot, since all of the components we added were already specified by default:

> ggplot(dat, aes(x = time, y = tmax)) + 
+ geom_line(colour = "black") +
+ scale_x_date() +
+ scale_y_continuous()

The two scale settings used in the preceding expression (scale_x_date() and scale_y_continuous()) had no effect since a continuous variable (such as tmax) is by default plotted on a continuous scale, while a Date variable (such as time) is by default plotted on a dates scale. In addition, we have an aesthetic setting (colour="black") that had no effect either. As opposed to an aesthetic mapping (which is always encompassed in the aes function), the aesthetic setting links the layer to a constant aesthetic value, irrespective of the data in a given column. In the preceding expression, we set the line color to black in the geom_line layer, with colour="black". Again, since "black" is the default value for the line color in ggplot2, this setting does not affect the plot appearance.

The plot we produced still needs a little polishing if we would like to include it in a publication. In general, certain plot adjustments are not related to the data (for example, changing the background color), while others are indirectly related to the data (for example, changing the color scale of a raster). First of all, we would like to have proper axis titles (for example, "Time" instead of "time" for the x axis), preferably without changing the data column names themselves. In many cases a journal may require a cleaner appearance, so we may also wish to remove the gray background and grid lines. The latter two properties can be modified using the scales and themes specifications, respectively.

Through the theme functions, we can control the general appearance of the plots' nondata elements, such as plot title, axis labels style, tick marks length, background color, and so on. This can be done in two ways, which are not mutually exclusive but additive:

  • The so-called complete themes can be specified to control all of the elements at once, in a predefined way. The predefined themes in ggplot2 currently include theme_grey, theme_bw, theme_linedraw, theme_light, theme_minimal, and theme_classic. You are welcome to replace theme_bw with each of these in the following code section, to see what each theme looks like.
  • Modifying individual elements can be done using the theme function, specifying the element names and the required arguments. For example, adding theme(panel.grid=element_blank()) eliminates grid lines. There are currently 50 theme elements (such as panel.grid), which can be modified through the theme function (see ?theme for the complete list). Three important things to note about themes are:
    • The value of each theme element can be set using the appropriate function (element_blank, element_line, element_rect, or element_text). For example, to set axis.title, we will use element_text, but to modify panel_grid, we will use element_line (unless we want to eliminate it altogether, in which case we will use element_blank). Consult ?theme to see which element function is appropriate in specific cases.
    • Within the element function, we can supply a list of characteristics we want to modify for that particular element. For example, element_line(colour="red",linetype="dotted") for panel.grid will make the grid lines red and dotted. As always, all unspecified properties remain at their default values.
    • Some element properties inherit from others. For example, with panel.grid=element_blank(), there is no point modifying the panel.grid.minor and panel.grid.major (minor and major grid lines, respectively) characteristics since the properties of both are overridden by panel.grid.

Functions to modify the scales start with scale, followed by the aesthetic property and the scale type. For example, scale_x_date sets the x axis position aesthetic property to the date type. Arguments within a scale function control additional properties of the scale, other than scale type which is determined by the function's name. For example, the name parameter determines the scale name that appears along the respective axis or legend. Each aesthetic we would like to set a scale for requires an individual scale function (for example, scale_x_date() and scale_y_continuous()).

As an example, in the following code we modify several theme and scale properties of the time series plot:

> ggplot(dat, aes(x = time, y = tmax)) + 
+ geom_line() +
+ scale_x_date(name = "Time") +
+ scale_y_continuous(
+ name = 
+ expression(paste("Maximum temperature (", degree, "C)"))) +
+ theme_bw() +
+ theme(panel.grid = element_blank())

Note

As you may have noticed, the y axis label is specified in a special way in the preceding code with expression(paste("Maximum temperature (", degree, "C)")). The reason for doing this is to introduce the degree (º) symbol in the y axis title (see the following screenshot). The expression function returns an object of class expression, which can be used to print mathematical symbols and annotations in R. It is combined with paste in order to include both regular characters and mathematical symbols. In this case, the text consists of the "Maximum temperature (" and "C)" parts, while the mathematical part is degree, which stands for the º symbol. The complete list of possible annotations and their syntax is available on the ?plotmath help page.

The following screenshot shows what the new version of our plot looks like:

An overview of ggplot2

We can see that the plot is now black and white (thanks to theme_bw) with no grid lines (thanks to theme(panel.grid=element_blank())) and it has appropriate axis labels (thanks to the scale specifications).

Note

At times, a touch of interactive fine-tuning is required to introduce nonstandard elements or formatting into the graphical output from R. For instance, we may wish to put an arrow mark (such as An overview of ggplot2) with annotation (such as this was an especially cold day) on the preceding plot. Although it is possible to accomplish almost everything through the ggplot2 syntax (including the latter example), in some situations it may be advantageous to make final adjustments by hand using a graphical editor such as Inkscape (freely available from http://www.inkscape.org/) or Adobe Illustrator. For this purpose, it is best to export the plot with ggsave to a vector graphics file (such as PDF or SVG), and then work on it in a graphical editor.

Plotting nonspatial data

Since our main focus is on plotting spatial data, in this section, we will go over just two examples of ordinary (nonspatial) plots. Most of the subsequent examples in this chapter will deal with producing spatial plots, or in other words, maps.

We will begin with a line plot, but a more elaborate one than in the previous example. As promised in Chapter 7, Combining Vector and Raster Datasets, we are going to compare, visually, the NDVI trajectory in two forests (Lahav and Kramim) according to two data sources (the Landsat and MODIS satellites). To do this, we will display the four relevant time series in a single plot. We will mark the data for each forest with a different color in order to distinguish them. In addition, we will use different geom functions for the data from each of the two satellites—the data from MODIS (where we have 280 data points for each forest) is best displayed with geom_line, while the data from Landsat (where we have only three data points for each forest) will be displayed with geom_point.

In terms of the ggplot2 syntax, the data from our input table forests_ndvi will be divided into two parts:

  • forests_ndvi[forests_ndvi$sat == "MODIS", ]
  • forests_ndvi[forests_ndvi$sat == "Landsat", ]

The first subset of forests_ndvi will be passed as the data argument to geom_line, while the second will be passed as the data argument to geom_point. As for the aesthetic mapping (the assignment expressions within the aes function), x is going to be mapped to date and y to ndvi, in both layers (NDVI is plotted as the function of time). In addition, the colour (in geom_line) and fill (in geom_point) are mapped to forest since we want to display the time series for each forest with a different color. One instance of the aesthetic setting is used in the point layer (shape=21), defining the points shape as 21 (which corresponds to a filled circle; see ?points). In addition to these two layers, six ggplot components are supplied to specify the various scale settings (just their names, in this case) and theme settings (no grid lines and "top" for the legend position). Note that the scales for fill and colour are discrete (scale_fill_discrete and scale_colour_discrete), which is appropriate for categorical variables such as the forest name.

Tip

The assigned colors are generated automatically. There is also a way to explicitly specify the color each group will take, using a manual scale. We will not go into that here; interested readers can follow the examples on ?scale_colour_manual to see how individual colors of choice can be specified.

The entire expression to produce the plot thus takes the following form:

> ggplot() +
+ geom_line(
+ data = forests_ndvi[forests_ndvi$sat == "MODIS", ],
+ aes(x = date, y = ndvi, colour = forest)) +
+ geom_point(
+ data = forests_ndvi[forests_ndvi$sat == "Landsat", ],
+ aes(x = date, y = ndvi, fill = forest), shape = 21) +
+ scale_x_date("Time") +
+ scale_y_continuous("NDVI") +
+ scale_fill_discrete("Forest") +
+ scale_colour_discrete("Forest") +
+ theme_bw() +
+ theme(panel.grid = element_blank(),
+ legend.position = "top")

The time series plot, as shown in the following screenshot, is produced as a result:

Plotting nonspatial data

Notably, the same legend has been generated for the points fill aesthetic and the lines colour aesthetic (and so, the legend symbols are composed of both line and point geometries). The fact that the same color scale is generated by ggplot2 for both aesthetics makes things easier.

As for the NDVI pattern the plot portrays, we can clearly see that the Kramim forest has consistently lower NDVI than Lahav and that the data from both satellites is in agreement on this. The seasonal NDVI pattern, which we have already witnessed on several occasions in previous chapters, is also apparent in the MODIS data.

In the second nonspatial example, we will plot the distribution of topographic slopes in built versus natural areas in Haifa, based on the buildings_mask and natural_mask rasters we created in Chapter 7, Combining Vector and Raster Datasets. This time, the data come in the form of rasters. Since ggplot2 requires data.frame objects, the first thing to do is bring the data into one. We will transfer the (non-NA) values of each raster to a table with two columns: one containing the raster values and the other identifying the cover type ("Buildings" or "Natural"):

> build = data.frame(cover = "Buildings", 
+ slope = buildings_mask[!is.na(buildings_mask)])
> nat = data.frame(cover = "Natural", 
+ slope = natural_mask[!is.na(natural_mask)])

Then, we will bind both tables into a single one with rbind:

> slopes = rbind(nat, build)
> head(slopes)
    cover     slope
1 Natural 0.3740864
2 Natural 0.3918563
3 Natural 0.4300925
4 Natural 0.4843213
5 Natural 0.5266151
6 Natural 0.3173897

The data is ready to be plotted. This time, what we are interested in is visually comparing distributions. The most obvious way of doing this is to plot histograms of the two variables side by side. In ggplot2, a histogram layer can be created with geom_histogram, specifying only a single aesthetic—x. In order to produce two histograms in a single plot, we will also use faceting.

Faceting, as previously mentioned, is the generation of numerous plots of the same type for different subsets of the data, within a single graphical output. For example, in the present context we would like to create two histograms side by side: one for the "Buildings" cover type and another for the "Natural" cover type. The reason for using faceting—in addition to saving the trouble of running the same code several times—is having a similar appearance and common axes in all subplots and thereby, making the comparison easier. There are two functions to produce facets in ggplot2; the difference between them is in the way the subplots are geometrically arranged:

  • facet_wrap: Used to create a continuous ribbon of panels, while (optionally) specifying the number of rows or columns (with nrow and ncol). This is most useful to create numerous subplots of the same type.
  • facet_grid: Used to create a two-dimensional grid of panels with the rows and columns defining different levels of a given variable. This is most useful when subplots are defined by combinations of two variables.

The most important parameter of both facet_wrap and facet_grid is the formula defining the grouping factor(s) to create the facets. With facet_wrap, we can specify only a single factor (for example, facet_wrap(~group)) since the facets form a one-dimensional ribbon. With facet_grid, we can specify two factors: one for the rows and another one for the columns (for example, facet_grid(group_row~group_column)). If we wish to create facets with facet_grid according to a single factor, we can replace one of the variables with a dot (.). For example, facet_grid(group_row~.) will result in a vertical ribbon of facets (since there is no factor for the columns), while facet_grid(.~group_column) will result in a horizontal ribbon of facets (since there are no factors for the rows).

To clarify things, it is best to show an example. The following code produces two histogram facets, according to the grouping variable cover, using facet_grid. Since we have only one grouping factor, we mark the column grouping as . to obtain a vertical ribbon (later in this chapter, in the Spain temperature maps example, we will use facet_grid with two grouping factors):

> ggplot(slopes, aes(x = slope)) +
+ geom_histogram() +
+ facet_grid(cover ~ .) +
+ scale_x_continuous("Slope (radians)") +
+ scale_y_continuous("Count") +
+ theme_bw()

The resulting plot is shown on the left in the following screenshot. Since in this case we have a single grouping variable, the same kind of plot can also be produced with facet_wrap. Replacing facet_grid(cover~.) with facet_wrap(~cover,ncol=1) produces an identical plot, except for a slightly different facet labeling scheme; the latter plot is shown on the right in the following screenshot. Note that in both cases, the facets share a common x axis.

Plotting nonspatial data

Looking at these histograms, we can see that most buildings are located on relatively flat terrain while most natural areas occupy steeper slopes, which makes sense.

Note

Quantitatively comparing the properties of slopes' distributions is straightforward using tapply and the slopes table. For example, tapply(slopes$slope,slopes$cover,mean) will show that the mean slopes for the "Natural" and "Buildings" cover types are 0.23 and 0.15, respectively. Substituting mean with other functions (such as min, max, and sd) will yield other properties.

Saving the ggplot2 plots

In addition to the usual method of saving a graphical output in a file (see Chapter 2, Working with Vectors and Time Series), there is a specialized function called ggsave for saving ggplot2 plots. For example, the first image in this chapter was incorporated into this book from a PNG file, obtained with ggsave as follows:

> ggplot(dat, aes(x = time, y = tmax)) + 
+ geom_line()
> ggsave("C:\Data\4367OS_09_01.png", width = 5.5, height = 3.5)

The ggsave function, by default, saves the last ggplot object that has been plotted in the graphical window. Therefore, the only mandatory parameter is the file path. The file extension provided in the path determines the file format (a PNG image in this case; see ?ggsave for a list of possible formats). The figure dimensions are taken as those of the currently active graphical window, unless explicitly specified through the width and height parameters. For instance, in this case, the width is equal to 5.5 inches (inches are the default unit, cm and mm are also possible) since this is a commonly used text width in the letter page format.

Tip

When saving the ggplot2 output to PDF, it is recommended to specify useDingbats=FALSE. This avoids potential problems associated with plot rendering, due to the default conversion of circles to text objects, when we prefer to leave the former as geometric objects.

A ggplot object can also be assigned and kept in memory:

> tmax_line = ggplot(dat, aes(x = time, y = tmax)) + 
+ geom_line()

It is already apparent, from the last few examples, that the print method applied to such an object induces drawing it in a graphical window. Assignment of a complete ggplot2 plot, or individual layers (which we will see later), may serve at least two useful purposes:

  • We can save a plot with ggsave providing the plot object as an argument so that we do not have to wait for the plot to be drawn in the graphical window for no good reason. This way, we also do not need to keep in mind which plot was produced last. For example, after the tmax_line object has been created, we can save the plot it describes as follows:
    > ggsave(plot = tmax_line, 
    + filename = "C:\Data\4367OS_09_01.png", 
    + width = 5.5, 
    + height = 3.5) 
  • We can define useful combinations of layers and/or settings, which we will repeatedly use to save us the trouble of typing the whole collection of components when producing a series of plots that have some components in common. Complete themes, such as theme_bw, also serve this purpose. For example, the following expression produces the same plot we saw in the first image in this chapter:
    > tmax1 = ggplot(dat, aes(x = time, y = tmax)) 
    > tmax2 = geom_line()
    > tmax1 + tmax2

Plotting spatial data

So far in this chapter, we introduced the ggplot2 package and briefly reviewed its syntax using a few simple, nonspatial examples. In the next two sections, we will move on to slightly more complicated examples, now involving spatial data. What you will see right away is that producing a map with ggplot2 is conceptually no different from producing any other kind of plot (because spatial data, in turn, is not conceptually different from any other type of data).

A map is, in fact, simply a two-dimensional plot with points, lines, polygons and/or rasters, with plot space corresponding to geographical space through a given CRS. Since plot space is tied to geographical space, a map has a specific aspect ratio between the x and y axes (1:1). In other words, we cannot stretch a map to be wider or narrower since this would distort the correct proportion between distances in the x and y directions (unlike the tmax time series plot we saw earlier in this chapter, for example, which can be stretched any way we like and still remain meaningful). As you may have noticed, the plot method, applied on spatial vector layers or rasters in previous chapters, also produces plots with a constant 1:1 aspect ratio. The difference is that in ggplot2, the input data comes as a data.frame object so the function has no way of knowing that the data is spatial. Therefore, we need to manually specify that the aspect ratio should be constant using the coord_equal function.

To summarize these considerations, a map produced with ggplot2 is just like any other plot produced with this package, except for the following two characteristics:

  • The plot represents geographical space through a given CRS and includes spatial data in the form of points, lines, polygons, and/or rasters
  • The x and y axes scale ratio is fixed at 1:1 (using coord_equal)

Note

It should be noted that ggplot2 (along with ggmap, which will be introduced in the next section) is not the best choice for all spatial data visualization tasks. Maps that require, for example, substantial annotation (such as labels, scale bars, and north arrows) or special display-optimization algorithms (such as reduced overlap between street name labels and other map elements) are easier to handle in traditional GIS software. The advantage of ggplot2, with respect to map making, becomes decisive in fairly analogous situations to those when R itself is advantageous:

  • Automation: When we want to automatically produce a large number of high-quality maps, for example, using facets or loops
  • Reproducibility: When we want to make sure exactly the same map is replicated with different datasets, at different times, or among different users
  • All-in-one environment: When we want to show certain elements computed in R on a map (such as spatial interpolation predictions) or integrate other capabilities that R has (such as downloading real-time data from the Web) without needing to transfer the data between different software

Now that the framework to use ggplot2 with spatial data is defined in general terms, let's proceed to the practical application. So far, we witnessed that spatial vector layers and rasters are best represented in R using special classes such as SpatialPolygonsDataFrame or RasterLayer. How can we convert these data structures to data.frame objects to be passed to ggplot? Point and raster layers are readily convertible to a data.frame object using the as.data.frame function (for example, we converted the spainT raster to a data.frame object in the previous chapter). As already mentioned, however, lines and polygons are more complex than points and rasters due to the fact that each line or polygon geometry is composed of a variable number of points. The order in which the points are connected is also an integral part of the data, responsible for the correct geometry drawing. Therefore, a data.frame object representing a line or polygon layer must have x and y coordinate columns (as do data.frame objects for points and rasters), but also a grouping column (to denote which set of points forms a given line or polygon) and an ordering column (to specify the order by which the points are connected when drawing the given line or polygon). Conveniently, a function called fortify is already defined in ggplot2 to convert line and polygon layers into data.frame objects while preserving these properties.

For example, at the end of Chapter 5, Working with Points, Lines, and Polygons, we created a SpatialPolygonsDataFrame object named county that contains the borders of US counties along with average population densities for each county in the attribute table (the density column). In order to bring the data in county to a data.frame form, readily available to be used by ggplot, we need to apply the fortify function. The "regions" parameter of fortify is used to specify which attribute table column corresponds to individual features. Since the FIPS column identifies unique counties in the attribute table, we can use it as the region:

> county_f = fortify(county, region = "FIPS")
> head(county_f)
     long      lat order  hole piece   group    id
1 1225972 -1274991     1 FALSE     1 01001.1 01001
2 1234371 -1274114     2 FALSE     1 01001.1 01001
3 1244907 -1272280     3 FALSE     1 01001.1 01001
4 1244132 -1267496     4 FALSE     1 01001.1 01001
5 1265116 -1263940     5 FALSE     1 01001.1 01001
6 1265318 -1263907     6 FALSE     1 01001.1 01001

This is a typical output of fortify. The table contains processed spatial information from the county layer plus the id column corresponding to the region variable (in this case, FIPS). The important columns, for our purposes, are as follows:

  • long and lat: The spatial coordinates. Note that these are the default x and y coordinate column names, respectively, created by fortify, even when the layer is not in a geographical CRS.
  • group: Individual geometries identifier.
  • id: Individual features identifier.

Since all other attribute data columns are removed by fortify, we need to append the density column manually by joining the attribute table of county back to county_f. However, before that, we need to change the name of the id column to "FIPS", to match the FIPS column in the attribute table:

> colnames(county_f)[which(colnames(county_f) == "id")] = "FIPS"
> county_f = join(county_f, county@data, "FIPS")
> head(county_f)
     long      lat order  hole piece   group  FIPS  NAME_1  NAME_2
1 1225972 -1274991     1 FALSE     1 01001.1 01001 Alabama Autauga
2 1234371 -1274114     2 FALSE     1 01001.1 01001 Alabama Autauga
3 1244907 -1272280     3 FALSE     1 01001.1 01001 Alabama Autauga
4 1244132 -1267496     4 FALSE     1 01001.1 01001 Alabama Autauga
5 1265116 -1263940     5 FALSE     1 01001.1 01001 Alabama Autauga
6 1265318 -1263907     6 FALSE     1 01001.1 01001 Alabama Autauga
  TYPE_2     area census2010pop  density
1 County 1562.805         54571 34.91863
2 County 1562.805         54571 34.91863
3 County 1562.805         54571 34.91863
4 County 1562.805         54571 34.91863
5 County 1562.805         54571 34.91863
6 County 1562.805         54571 34.91863

Using this table, we are ready to create a map showing population densities per county. However, it would be nice to add state borders as well to aid in map apprehension. To obtain states borders layer, we will download it from the GADM database (see Chapter 5, Working with Points, Lines, and Polygons). We will then use fortify on it as well:

> states = getData("GADM", country = "USA", level = 1)
> states = states[!(states$NAME_1 %in% c("Alaska", "Hawaii")), ]
> states = spTransform(states, CRS(proj4string(county)))
> states_f = fortify(states, region = "NAME_1")
> head(states_f)
     long      lat order  hole piece     group      id
1 1076104 -1034268     1 FALSE     1 Alabama.1 Alabama
2 1085410 -1033146     2 FALSE     1 Alabama.1 Alabama
3 1093749 -1031892     3 FALSE     1 Alabama.1 Alabama
4 1107308 -1030032     4 FALSE     1 Alabama.1 Alabama
5 1108666 -1029851     5 FALSE     1 Alabama.1 Alabama
6 1112841 -1029288     6 FALSE     1 Alabama.1 Alabama

To plot a spatial layer created with fortify (either lines, as we shall see later, or polygons), we need to map the x and y aesthetics to the long and lat columns, respectively. In addition, we need to map the group column to the group aesthetic so that each geometry will be drawn separately. For example, the California feature is composed of several geometries (the mainland plus several islands in the Pacific Ocean), and ggplot needs to know which points form separate polygons according to the group column ("California.1", "California.2", and so on). If we want to give certain aesthetics to each state, we can do that using the id column (where all California polygons share the "California" label).

Note

The group aesthetic can also be implemented in nonspatial layers and in layers other than geom_polygon. It is used to identify individual geometries in order to draw them separately, when no other aesthetic (such as colour or fill) does the job. For example, in the following screenshot, we do not want each geometry (such as each island of California) to have a unique appearance, but we still want these geometries to be drawn separately rather than with a line going through all of them one by one. Try removing the group=group part from the following expression and you will see what a mess the result is. You can also try group=id to see that it messes up only those states that are composed of more than one geometry, where grouping by the group column is indeed essential.

Another helpful feature we will experiment with in the present example is to save a collection of the ggplot2 components in order to integrate them in several plots and not have to type the expressions each time. We will create an object named sp_minimal, defining our own custom theme (based on theme_bw, with the axis name, labels, and tick marks removed):

> sp_minimal =
+ theme_bw() +
+ theme(axis.text = element_blank(),
+ axis.title = element_blank(),
+ axis.ticks = element_blank())

Before we plot county densities, we will start with a simpler example—drawing just the states_f layer—to see how spatial polygon layers are drawn in ggplot2:

> ggplot() + 
+ geom_polygon(data = states_f, 
+ aes(x = long, y = lat, group = group), 
+ colour = "black", fill = NA) +
+ coord_equal() +
+ sp_minimal

A map of the states is produced as follows:

Plotting spatial data

As you can see in the previous syntax, the polygons are drawn using geom_polygon according to the long, lat, and group data columns (mapped to the x, y, and group aesthetics, respectively), with the colour and fill polygons set to constant values ("black" for colour and NA for fill, respectively; this means empty polygons with black borders). The other two plot elements are coord_equal (which makes sure the 1:1 aspect ratio is maintained) and the sp_minimal theme we just defined.

Now, we will make things slightly more complicated by plotting both the states_f and county_f layers together. Let's first view the code and output. Afterwards, we will discuss the different components involved. Here's the code that produces a county population density map:

> ggplot() + 
+ geom_polygon(data = county_f, 
+ colour = NA, 
+ aes(x = long, y = lat, group = group, fill = density)) +
+ geom_polygon(data = states_f, 
+ colour = "white", size = 0.25, fill = NA,
+ aes(x = long, y = lat, group = group)) +
+ scale_fill_gradientn(
+ name = expression(paste("Density (",km^-2,")")), 
+ colours = rev(rainbow(7)), 
+ trans = "log10", 
+ labels = as.character,
+ breaks = 10^(-1:5)) +
+ coord_equal() +
+ sp_minimal

The following screenshot shows how the resulting plot looks:

Plotting spatial data

Reading the preceding code reveals, first of all, that the plot is made by combining the following five components:

  • geom_polygon (with county_f)
  • geom_polygon (with states_f)
  • scale_fill_gradientn
  • coord_equal
  • sp_minimal

The two geom_polygon layers, representing "states" and county layers, are drawn differently according to our needs in this visualization. The first thing to note about these layers is their order of appearance: the county_f layer is added before states_f. This is not a coincidence; the layers are drawn on the screen in the same order in which they are provided in the code. Since state and county borders obviously overlap, and we wish to draw the states borders on top of the counties borders (otherwise the states borders will not be visible), it is important to specify the layers in the right order.

The states_f layer arguments are identical to those in the previous example; except that, we made the polygon borders white (with colour="white") and thin (with size=0.25). As for the county_f layer, the color of the borders (rather than fill) was set to NA, resulting in borderless polygons, and fill was mapped to density. The latter is indeed the defining feature of the map we made: mapping the county_f polygons fill aesthetic to density is what gives different population densities different colors, and that was the whole purpose of making this plot.

We mapped fill to density, but which color gets assigned to which density level? As already mentioned, the scale function determines the way values in the data are matched with the aesthetic appearance. So unless we are happy with the default scale (which is frequently not the case), we have to set the scale characteristics ourselves. For example, we already used scale_colour_discrete in the forests NDVI time series example (applicable for discrete variables). In the present example, the scale_fill_gradientn function, as the name suggests, was used to set the fill scale type to gradientn. The latter is just one out of several possible fill scale types (applicable to the colour aesthetic as well; see http://docs.ggplot2.org/ for a complete list). For continuous variables, ggplot2 offers three types of color scales, which can be specified using the following functions:

  • scale_fill_gradient: A two-color gradient, with the gradient end colors specified with high and low (see the Haifa buildings density example later in this chapter)
  • scale_fill_gradient2: A three-color gradient, with the low, med, and high colors specified (see the London buildings distance to river example later in this chapter)
  • scale_fill_gradientn: A custom n-color gradient, specified with a colours vector

Note

There are several ways to create a vector of colors in R to be supplied to the colours parameter of scale_colour_gradientn as well as to other graphical functions. The simplest way, which we used in the present example, is to employ one of the predefined color palette functions that can take a numeric value as an argument and produce a series of colors (returned as RGB codes) at equal intervals. These functions can be used for many purposes:

  • rainbow
  • heat.colors
  • terrain.colors
  • topo.colors
  • cm.colors

In the examples in this chapter, we employ rainbow and terrain.colors. You are welcomed to experiment with the other three palettes to see what they look like. The color codes these functions return are rarely useful in their own right. Instead, they are passed as color palettes to graphical functions such as scale_colour_gradientn.

> rainbow(3)
[1] "#FF0000FF" "#00FF00FF" "#0000FFFF"

In cases when you have a different palette in mind which is not covered by these functions, you can use the RColorBrewer package (which provides many more palettes) or the colorRampPalette function available in base R (which can be used to create custom-made color palettes; see the last example in this chapter).

A short explanation of the arguments of the scale_fill_gradientn function we used is in order:

  • name=expression(paste("Density (",km^-2"",")")): The name argument specifies the scale name and the way it will appear on the plot legend. We once again used the expression function, this time to add -2 in superscript (see the preceding screenshot).
  • colours=rainbow(7): Specifies the vector of colors to create a color palette with. Here, specifying seven colors gives a good result, with the whole variability the rainbow palette provides represented. It may require trial and error to find out how many colors are sufficient in each case.
  • trans="log10": The trans parameter can be passed to any continuous scale function in ggplot2. It is used to specify a mathematical transformation for the data. In this particular example, we use a logarithmic scale ("log10") to highlight the differences in the population density between different counties, which is necessary since the distribution of county densities is highly nonhomogeneous, with many low-density counties and few high-density counties. You can try deleting the trans, labels, and breaks arguments, and running the previous code once more to see why the default linear scale is inappropriate in this case. There are many useful built-in types of transformations, as well as methods to define custom transformations but this is beyond the scope of this book.
  • labels=as.character and breaks=10^(-1:5): The breaks argument specifies the breakpoints where tick marks and labels are generated, while the labels argument specifies the labels themselves or (as in this case) a function to format the labels with. In this case, by default, R prints the 10^(-1:5) labels vector in mathematical notation (1e-01, 1e+00, 1e+01, and so on). This is evaded by converting the numeric values to characters (using as.character) to receive more comprehensive labels (0.1, 1, 10, and so on).

The last two components in the code section that produces the preceding plot—coord_equal and sp_minimal—were already used in the previous plot, so we will not repeat their meaning here.

Finally, referring to the plot itself, we can see the concentration of densely populated counties in the eastern half of the USA and along the coast of the Pacific Ocean. Further inland in the Western USA, population density is generally low.

Moving on to the second type of spatial data structure, a raster, we are going to experiment with different variations of the Haifa DEM map in the following few examples. As the data source, we will take the dem and hill rasters (see Chapter 6, Modifying Rasters and Analyzing Raster Time Series). Data from these two rasters will be transferred into a single data.frame object with four columns: cell coordinates (x and y), elevation value (elev), and hillshade value (hill). Note that in this case, we use the data.frame function to do this, supplying the coordinates and the vector of values from each raster separately (relying on the fact that the two rasters are geometrically identical, since hill was derived from dem). An equally good approach would have been to stack the two rasters and then convert the result to a data.frame object with as.data.frame(...,xy=TRUE):

> dem_hill = data.frame(coordinates(dem), 
+ elev = dem[],
+ hill = hill[])
> dem_hill = dem_hill[complete.cases(dem_hill), ]
> head(dem_hill)
            x       y elev      hill
2421 698971.3 3648863   19 0.3533017
2422 699061.3 3648863   20 0.3408402
2423 699151.3 3648863   19 0.3402018
2424 699241.3 3648863   19 0.3451085
2425 699331.3 3648863   20 0.3439193
2426 699421.3 3648863   21 0.3368811

Now that we have the dem_hill table, we can plot the data using a geom_raster layer. The geom_raster function requires the x, y, and fill aesthetics to draw a raster, as the following code and images demonstrate. In addition, since we are going to make several versions of the Haifa DEM plot, it would be best to save this basic plot as a ggplot object (hereby named haifa_relief) so that we can later update it by incorporating additional layers:

> haifa_relief = ggplot(dem_hill, aes(x = x, y = y)) + 
+ geom_raster(aes(fill = elev)) +
+ scale_fill_gradientn("Elevation (m)", 
+ colours = terrain.colors(10)) +
+ coord_equal() + 
+ theme_bw() +
+ theme(axis.title = element_blank(),
+ axis.text.y = element_text(angle = 90, hjust = 0.5))
> haifa_relief

The syntax being used is analogous to the previous example, except that we use geom_raster instead of geom_polygon; x and y are mapped to spatial coordinates, while fill is mapped to raster values. Note that geom_raster takes aes(fill=elev) as an argument, while the missing parts (data, x, and y) are passed, by default, from the ggplot set of arguments. As we shall see right away, this mode of operation is very convenient when we would like to add another layer (in this case, the hillshade) with the same arguments so that we do not have to type them once again. The theme components this time specify that the y axis labels should be rotated by 90º (which is often used in maps). The resulting plot is shown on the left panel in the following screenshot. Note that the color scale used (terrain.colors) is particularly useful to display the topography.

Now that we have our basic relief image, we can experiment with various additional components that we may wish to show on top of it. For instance, the hillshade calculation gave us a layer of theoretical shading degree for the given topography and sun position (see Chapter 6, Modifying Rasters and Analyzing Raster Time Series). We can add the shading values as an all-black raster with various levels of transparency (making the shaded areas less transparent and thus darker). This will create an illusion of a three-dimensional image, which is also known as a shaded relief in cartography. To create shading, we need to add another raster layer with black pixels whose transparency aesthetic is mapped to the hill values. Transparency, in ggplot2 as well as in other graphical functions in R, is determined by a parameter named alpha, ranging from 0 to 1, with 0 being completely transparent and 1 being completely opaque. Limiting the maximum of the alpha scale to 0.5, using the range parameter makes sure we do not get completely black pixels in heavily shaded areas:

> haifa_relief_shade = haifa_relief + 
+ geom_raster(aes(alpha = hill), fill = "black") +
+ scale_alpha_continuous("Hillshade", range = c(0.5, 0))
> haifa_relief_shade

The resulting plot is shown on the right panel in the following screenshot. Viewing the two images side by side demonstrates the effect that shading has.

Plotting spatial data

Talking about topography, another feature we may wish to add to this plot is contour lines. We already calculated contour lines from rasters both indirectly (displaying them using levelplot, in Chapter 4, Working with Rasters, for instance) and directly (creating a contour lines layer with rasterToContour, in Chapter 7, Combining Vector and Raster Datasets). Here, we are looking at yet another indirect way to calculate contour lines based on a raster using ggplot2. To add a contour lines layer, we can use the geom_contour function, mapping its z aesthetic to elevation so that contour lines are calculated according to it:

> haifa_relief_shade + 
+ geom_contour(aes(z = elev), colour = "black", alpha = 0.3)

The following screenshot shows the graphical output that is produced, showing semi-transparent contour lines on top of the shaded relief of Haifa:

Plotting spatial data

In the previous example, we generated a contour plot based on the values of an existing raster (the Haifa DEM in this case). Another common type of plot showing contours is one where contours denote the average density of events in a point pattern. Creating density contours is a common way to reduce the amount of information and make point pattern visualization more comprehensible. For example, when we have so many points that they overlap each other, it may be difficult to discern denser and sparser locations. Drawing contours is one possible solution to this problem.

A commonly used procedure to create density contours out of a point pattern is two-dimensional kernel density estimation, available through the base R function named kde2d. The geom_density2d function is a ggplot2 adaptation of the latter function, creating contours out of a density surface calculated with kde2d.

As an example of geom_density2d, we will draw contours showing the average density of buildings in Haifa.

Note

It should be noted that, in practice, density estimation is only meaningful for mapped point patterns, that is, point patterns where all events in the studied area have been detected. When detection is incomplete, density estimates will obviously be biased to an unknown degree in each location. In our case, for example, if the OpenStreetMap layer of Haifa buildings is incomplete, then a density estimate based on it will not be very useful as it will encompass underestimation to an unknown and variable degree.

Since density estimation requires a point pattern, we need to convert the buildings layer to a point layer by finding building centroids:

> haifa_buildings_ctr = gCentroid(haifa_buildings, byid = TRUE)
> haifa_buildings_ctr = as.data.frame(haifa_buildings_ctr)
> head(haifa_buildings_ctr)
         x       y
1 684013.4 3629679
2 688023.0 3629752
3 687645.0 3627410
4 685913.9 3631217
5 683144.4 3633649
6 683515.3 3628980

Taking the haifa_relief_shade plot, we will now add both the buildings point pattern (using geom_point) and the density contours based on that pattern (using geom_density2d). Using the limits parameter of scale_x_continuous and scale_y_continuous, we will also limit our scope to a window centered at haifa_buildings_ctr. In addition, the colour aesthetic of the contours is mapped to ..level.., which corresponds to the contour breaks. The level variable is surrounded by .., which is a special way to signal to ggplot2 that we are referring to a variable generated by the layer itself, rather than a variable present in data.frame:

> haifa_relief_shade + 
+ geom_contour(aes(z = elev), colour = "black", alpha = 0.3) +
+ geom_point(data = haifa_buildings_ctr, size = 0.5) +
+ geom_density2d(data = haifa_buildings_ctr, 
+ aes(colour = ..level..)) +
+ scale_colour_gradient("Density", low = "blue", high = "red") +
+ scale_x_continuous(
+ limits = c(min(haifa_buildings_ctr$x)-2000, 
+ max(haifa_buildings_ctr$x)+2000)) +
+ scale_y_continuous(
+ limits = c(min(haifa_buildings_ctr$y)-2000, 
+ max(haifa_buildings_ctr$y)+2000))

The following screenshot shows the graphical output:

Plotting spatial data

In the preceding screenshot, we see three different legends as there are three different scales: elevation, hillshade, and buildings density. According to this map (again, assuming that all buildings have been mapped), we can see that the highest density of individual buildings per unit area is found on the eastern slopes of Mount Carmel.

Our final example in this section will be to plot the time series of interpolated standardized temperature maps, which we created in the previous chapter. We already took the trouble of reshaping the data into a tidy table; therefore, we do not need to take any preliminary processing steps. Using the spainT table and the following code section, we recreate a plot similar to the rasterVis version from the previous chapter with ggplot2:

> ggplot(spainT, aes(x = x, y = y, fill = value)) +
+ geom_raster() +
+ scale_fill_gradient2(
+ expression(paste("Value (", degree, "C)")), 
+ low = "blue", high = "red", limits = c(-4,4)) +
+ geom_contour(
+ aes(z = value), 
+ colour = "black", size = 0.1, breaks = c(-4:-1,1:4)) +
+ coord_equal() +
+ facet_grid(variable ~ year) +
+ sp_minimal

The following screenshot shows the graphical output:

Plotting spatial data

Note that we have once again utilized our sp_minimal custom-made theme to eliminate the axis annotation. The geom_contour function has been used to generate contour lines, similar to the Haifa DEM elevation contours, only that now we specified the exact vector of breaks, specifying where contour lines will be generated. Using facet_grid(variable~year), we arranged the facets in rows according to variable and in columns according to year.

Adding static maps from the Web

Nowadays, cartographers and spatial analysts are lucky to have a variety of online resources for map creation. We already used freely available satellite imagery, DEMs, and administrative border datasets, for example. However, to recover and process such datasets is not always a quick and straightforward task. We may need to figure out the spatial location in question, search for available resources, import them into R, and so on. On the other hand, several applications, such as Google Maps and OpenStreetMap, provide readily available maps of the entire earth with a variety of features that can be unreasonable to collect ourselves each time we want to create a background for a quick visualization (satellite and aerial imagery, roads, borders, town names, and so on).

The ggmap package is an extension to ggplot2, providing a simple interface to download static maps from such online resources and easily incorporate them as the background when creating maps within the ggplot2 framework. We already used the ggmap package for geocoding (see Chapter 5, Working with Points, Lines, and Polygons), and mentioned the introductory paper on this package by Kahle and Wickham (2013).

Note

By static maps, we refer to maps that cannot be manipulated by scrolling, zooming in and zooming out, and so on, as opposed to dynamic maps that respond to such user feedback. The whole ggplot2 framework, in that sense, may be referred to as static plotting. It should be noted that dynamic plots and maps (and even entire web applications; see the shiny package) can also be produced in R. As for dynamic maps, for instance, we have already seen a short example of using the Google Earth software as a platform for dynamic spatial data display from R using the plotKML package (see Chapter 4, Working with Rasters). Another example is the plotGoogleMaps package, which can be used to display spatial data from R on top of interactive Google Maps in a web browser.

The static maps incorporation with ggmap is quite simple in practice. Using the get_map function of the ggmap package, we first need to download the required background image. Then, we can use that image as a background layer with the ggmap function of the same package. Any additional layers and components can be added to the plot initiated with ggmap the same way as to a plot initiated with ggplot.

Our first example featuring ggmap will be to display the City of London buildings and their distances to the River Thames, as promised in Chapter 5, Working with Points, Lines, and Polygons. Since layers obtained with ggmap are always defined in a geographical CRS, we must first reproject the two relevant layers that we will plot to such a CRS as well. The layers we are going to plot, in this case, are as follows:

  • buildings: City of London buildings
  • city: City of London boundary polygon

The following expressions reproject these layers and save them as new objects buildings_geo and city_geo, respectively:

> buildings_geo = spTransform(buildings, 
+ CRS("+proj=longlat +datum=WGS84"))
> city_geo = spTransform(city, 
+ CRS("+proj=longlat +datum=WGS84"))

To specify the location for which we would like to download a map with get_map, we can use the coordinates of the buildings layer centroid, obtained in the following manner:

> buildings_ctr = coordinates(gCentroid(buildings_geo))
> buildings_ctr
            x        y
1 -0.09222641 51.51499

Next, we will fortify the buildings_geo layer, to make the data available in the form of a data.frame object. The region in this case is the osm_id attribute table column, which holds unique building identifiers:

> buildings_f = fortify(buildings_geo, region = "osm_id")
> head(buildings_f)
         long      lat order  hole piece       group        id
1 -0.09962729 51.51428     1 FALSE     1 100684524.1 100684524
2 -0.09952849 51.51429     2 FALSE     1 100684524.1 100684524
3 -0.09942449 51.51429     3 FALSE     1 100684524.1 100684524
4 -0.09941299 51.51424     4 FALSE     1 100684524.1 100684524
5 -0.09951839 51.51423     5 FALSE     1 100684524.1 100684524
6 -0.09961579 51.51422     6 FALSE     1 100684524.1 100684524

Similar to what we saw in the county example, fortify removes the attribute table; therefore, we need to attach it back manually:

> colnames(buildings_f)[which(colnames(buildings_f) == "id")] = 
+ "osm_id"
> buildings_f = join(buildings_f, buildings@data, "osm_id")
> head(buildings_f)
         long      lat order  hole piece       group    osm_id
1 -0.09962729 51.51428     1 FALSE     1 100684524.1 100684524
2 -0.09952849 51.51429     2 FALSE     1 100684524.1 100684524
3 -0.09942449 51.51429     3 FALSE     1 100684524.1 100684524
4 -0.09941299 51.51424     4 FALSE     1 100684524.1 100684524
5 -0.09951839 51.51423     5 FALSE     1 100684524.1 100684524
6 -0.09961579 51.51422     6 FALSE     1 100684524.1 100684524
        name type dist_river
1 Temple Bar <NA>   378.7606
2 Temple Bar <NA>   378.7606
3 Temple Bar <NA>   378.7606
4 Temple Bar <NA>   378.7606
5 Temple Bar <NA>   378.7606
6 Temple Bar <NA>   378.7606

As the output shows, each building is now, once again, associated with a dist_river value.

Finally, we will fortify the city_geo polygon as well. Since there is no attribute data of interest along with it, we do not have to join anything to the resulting data.frame object in this case:

> city_f = fortify(city_geo, region = "CTYUA13NM")
> head(city_f)
         long      lat order  hole piece            group
1 -0.09671385 51.52319     1 FALSE     1 City of London.1
2 -0.09669776 51.52316     2 FALSE     1 City of London.1
3 -0.09668468 51.52317     3 FALSE     1 City of London.1
4 -0.09662369 51.52318     4 FALSE     1 City of London.1
5 -0.09646984 51.52282     5 FALSE     1 City of London.1
6 -0.09601742 51.52295     6 FALSE     1 City of London.1
              id
1 City of London
2 City of London
3 City of London
4 City of London
5 City of London
6 City of London

Examining the group column in this particular case will reveal that it has the same value ("City of London.1") in all rows since the city layer is composed of a single polygon.

The layers are ready. What is left to be done is download the background and produce the map. To download the background, we will employ the get_map function. In order to efficiently work with this function, we need to provide arguments for just these three parameters:

  • location: The center of the map to be downloaded, which can be a longitude/latitude pair.
  • maptype: The type of map to be downloaded. This is source-specific, but for Google Maps (the default data source), "terrain", "satellite", "roadmap", and "hybrid" are applicable. See the following screenshot for an example of a "hybrid" map; for a "satellite" example, see the screenshot after the following screenshot.

    Note

    The default data source, and the one we use here, is Google Maps. Note that the Google Static Maps API is used by get_map to download the images in such cases, and that by using this function, you are agreeing to the Google Maps API Terms of Service (https://developers.google.com/maps/terms). Other data sources can be selected by modifying the source parameter of get_map, with possible sources being Google Maps ("google"), OpenStreetMap ("osm"), Stamen Maps ("stamen"), or CloudMade maps ("cloudmade"). See ?get_map for details.

  • zoom: The zoom level; an integer between 3 and 21 (the default value is 10). The best zoom value for a given map is determined by trying higher or lower magnifications and examining the result.

In our case, to get a static map of the City of London composed of a photographic map with roads and other features marked on top (a "hybrid" map), we can use the following expression:

> city_map = get_map(location = buildings_ctr, 
+ maptype = "hybrid", 
+ zoom = 14)

Now, plotting this map on its own (to inspect the zoom argument's effect, for instance) can be done by simply typing ggmap(city_map), that returns a ggplot object by default, plotted in the graphical window—as we have already seen earlier. However, to add our supplementary layers based on the buildings_f and city_f objects, we need to incorporate them using two geom_polygon function calls. A few minor settings are also introduced in this particular example, which are as follows:

  • To use muted colors, for a prettier color scale, the muted function of the scales package is being used
  • Forcing the legend heading "Distance to river (m)" to be split over two lines is achieved using the new line symbol
  • The labs function is used to easily modify axis titles, instead of modifying the scale itself with the scale functions

The code to produce the City of London buildings map appears as follows:

> library(scales)
> ggmap(city_map) + 
+ geom_polygon(data = buildings_f,
+ aes(x = long, y = lat, group = group, 
+ fill = dist_river), 
+ size = 0.1, colour = "black") +
+ geom_polygon(data = city_f, 
+ aes(x = long, y = lat, group = group), 
+ colour = "yellow", fill = NA) +
+ scale_fill_gradient2("Distance
to river (m)", 
+ low = muted("blue"), high = muted("red"), 
+ midpoint = 500) +
+ labs(x = "Longitude", y = "Latitude")

The resulting map looks like the following screenshot:

Adding static maps from the Web

The map shows an up-to-date "hybrid" map of London downloaded from Google Maps, with the City of London boundary (in yellow) and buildings (in blue to red colors, according to their distance to the River Thames) on top of it.

To practice using ggmap some more, we will create another map with the following components encompassing all three vector layer types—points, lines, and polygons—along with a static map downloaded from the Web:

  • A "satellite" static map background
  • towns: A point layer of two towns' locations (see Chapter 7, Combining Vector and Raster Datasets)
  • track: A line layer of a GPS track (see Chapter 5, Working with Points, Lines, and Polygons)
  • forests: A polygonal layer of two planted forests (see Chapter 7, Combining Vector and Raster Datasets)

As in the previous example, we first need to bring all of these layers to a geographic CRS:

> towns_geo = spTransform(towns, 
+ CRS("+proj=longlat +datum=WGS84"))
> track_geo = spTransform(track, 
+ CRS("+proj=longlat +datum=WGS84"))
> forests_geo = spTransform(forests, 
+ CRS("+proj=longlat +datum=WGS84"))

Next, we need to use fortify on the line and polygonal layers (track_geo and forests_geo) in order to bring them to a data.frame form. Point layers, as previously mentioned, have a much simpler structure than lines and polygons, so no fortify method exists for them. Instead, we can always manually construct a data.frame object to represent a point layer. In the present case, for example, towns_geo is a SpatialPoints object and towns_names is a character vector. Using the coordinates functions and data.frame, we can combine them into a data.frame object:

> towns_f = data.frame(coordinates(towns_geo), name = towns_names)
> towns_f
       lon      lat          name
1 34.87131 31.37936 Lahav Kibbutz
2 34.81695 31.37383       Lehavim
> track_f = fortify(track_geo)
> head(track_f)
      long      lat order piece group id
1 34.85472 31.36520     1     1   0.1  0
2 34.85464 31.36540     2     1   0.1  0
3 34.85458 31.36559     3     1   0.1  0
4 34.85454 31.36519     4     1   0.1  0
5 34.85443 31.36639     5     1   0.1  0
6 34.85445 31.36733     6     1   0.1  0
> forests_f = fortify(forests_geo, region = "name")
> head(forests_f)
      long      lat order  hole piece    group     id
1 34.87591 31.33830     1 FALSE     1 Kramim.1 Kramim
2 34.87559 31.33831     2 FALSE     1 Kramim.1 Kramim
3 34.87560 31.33858     3 FALSE     1 Kramim.1 Kramim
4 34.87528 31.33858     4 FALSE     1 Kramim.1 Kramim
5 34.87529 31.33885     5 FALSE     1 Kramim.1 Kramim
6 34.87529 31.33912     6 FALSE     1 Kramim.1 Kramim

The three data.frame objects (towns_f, track_f, and forests_f) are now in memory. The missing component is just the background static map, which can be downloaded with get_map as follows:

> forests_ctr = coordinates(gCentroid(forests_geo))
> forests_map = get_map(location = forests_ctr, 
+ maptype = "satellite", 
+ zoom = 12)

The following code is used to combine all of these components into a single plot. Note that track_f is introduced through a geom_path layer (which is the appropriate geometry for spatial lines) and forests_f is introduced using geom_polygon. The point layer towns_f is added as a geom_text layer, rather than geom_point, in order to display the towns' locations as name labels instead of points. The label aesthetic of geom_text controls the text that each point is associated with, which in our case is the name column that holds town names.

> ggmap(forests_map) + 
+ geom_polygon(data = forests_f, 
+ aes(x = long, y = lat, group = group, colour = id), 
+ fill = NA) +
+ geom_path(data = track_f, 
+ aes(x = long, y = lat), 
+ colour = "yellow") +
+ geom_text(data = towns_f, 
+ aes(x = lon, y = lat, label = name), 
+ colour = "white", size = 2.5, fontface = "bold") +
+ scale_colour_discrete("Forest") +
+ labs(x = "Longitude", y = "Latitude")

The resulting map is shown in the following screenshot:

Adding static maps from the Web

This map shows the towns' locations (as white text labels), the GPS track (as a yellow line), and the forest borders (as polygons, with border colors according to the forest identity).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.94.190