Chapter 7. Combining Vector and Raster Datasets

Generating new insight by overlaying several layers of spatial information, one on top of the other, constitutes one of the main concepts of spatial data analysis, as we have already seen in the previous chapters. So far, however, we have only used operations involving either rasters alone or vector layers alone, but not a combination of both. Although the two types of spatial layers have their characteristic uses (such as rasters for DEMs and vector layers for administrative borders), combining them in a single analysis is often desired. As we shall see, this is a less straightforward task, characterized by specific procedures and decisions.

In this chapter, we are going to explore the interplay between vector and raster layers, and the way it is implemented in the raster package. The way rasters and vector layers can be interchanged and queried, one according to the other, will be demonstrated through examples.

Note

In this chapter, we are going to use objects that were created in the previous chapters and the packages we used to do that (plyr, raster, and rgeos). Make sure these are loaded before running the code sections in this chapter. For convenience, this chapter's code file on the book's website repeats the relevant code sections from the previous chapters.

In this chapter, we'll cover the following topics:

  • Creating rasters from vector layers and vice versa
  • Masking rasters with vector layers
  • Extracting raster values according to vector layers

Creating rasters from vector layers

One of the main reasons to convert a vector layer to a raster is that we are interested in employing raster analysis tools or procedures on data that is currently held in vector form (and vice versa). For example, when preparing a multiband raster with various environmental characteristics of a given area, such as slope or NDVI, we may wish to add layers that are commonly given in the vector format, such as built area polygons or road lines. To do this, we first need to convert these vector layers to rasters, and then supplement the multiband raster with the additional layers.

The process of converting a vector layer to a raster is called rasterizing, in the raster package terminology, and it is performed with the rasterize function. In this section, we will see an example of how to rasterize a point vector layer, while keeping in mind that the procedures to rasterize lines and polygons are analogous. You will also learn the related operations of raster masking using a vector layer, which conceptually is a special case of rasterizing and overlay.

Rasterizing vector layers

Creating a raster out of a vector layer is quite simple in concept. Given a vector layer and a raster grid, the new raster cells get filled with values in places where the raster overlaps with the vector layer. The rest of the raster cells (those that are not in contact with the vector layer) are left with NA. Those raster cells that overlap with an individual feature in the vector layer are assigned unique values. These values can simply be consecutive integers, or they can come from any vector corresponding to the number of features (such as an attribute table column). The procedure will be made clearer with the following example.

In our first example, we will use a simple point layer with the locations of two towns: Lahav Kibbutz and Lehavim. We will first create a layer named towns, using geocoding as follows (see Chapter 5, Working with Points, Lines, and Polygons, for more information):

> library(ggmap)
> towns_names = c("Lahav Kibbutz", "Lehavim")
> towns = geocode(towns_names)
> coordinates(towns) = ~ lon + lat
> proj4string(towns) = CRS("+proj=longlat +datum=WGS84")
> towns = spTransform(towns, CRS(proj4string(l_00)))

Note that in the last expression, the layer is transformed to the CRS of the Landsat image l_00, which we assume is in memory (see Chapter 4, Working with Rasters, for more information). Let's visualize the towns layer using the l_00 image as the background:

> plotRGB(l_00, r = 3, g = 2, b = 1, stretch = "lin")
> plot(towns, col = "red", pch = 16, add = TRUE)
> text(coordinates(towns), towns_names, pos = 3, col = "white")

The resulting graphical output is shown in the following screenshot:

Rasterizing vector layers

Note that the dark patch of green vegetation neighboring Lahav Kibbutz to the west is Lahav forest. Another forest, Kramim, can be seen to the South-East of the Kibbutz. We will return to these two forests in the examples later.

Now let's see how the vector layer towns can be converted to a raster. Since we have two points, our result is going to be a raster with two cells having a non-NA value, no matter which grid we use, as long as both points are within its extent and the cell size is not large enough to encompass both points within a single cell. The rasterize function, to convert a vector to a raster, requires two main arguments:

  • The vector layer to rasterize (x)
  • The raster defining the grid (y)

Note that the role of y is only to provide a raster grid definition; its values do not participate in the operation in any way (similar to the role of the to parameter in raster reprojection; see the previous chapter). In this example, we will use the MODIS raster r (see Chapter 4, Working with Rasters) to transfer towns onto its 500 meter grid, as follows:

> towns_r = rasterize(towns, r)

The result, towns_r, is a RasterLayer object with two non-NA values, 1 and 2, since the raster values are defined as the feature indices (numbers from 1 to n, where n is the total number of features) by default:

> towns_r[!is.na(towns_r)]
[1] 1 2

In our case, the 1 cell corresponds to the first feature in towns (Lahav Kibbutz) and the 2 cell corresponds to the second one (Lehavim).

To display towns_r, we will first crop it according to the extent of towns plus a 3-kilometer buffer:

> towns_r = crop(towns_r, extent(towns) + 3000)

Let's plot the resulting raster, and the original vector layer on top of it, including the relevant labels. We will use the col parameter of plot to specify a two-color scale with "lightblue" (this color will be used for 1) and "brown" (this color will be used for 2):

> plot(towns_r, col = c("lightblue", "brown"))
> plot(towns, add = TRUE)
> text(coordinates(towns), towns_names, pos = 3)

The resulting graphical output is shown in the following screenshot:

Rasterizing vector layers

The white background we see corresponds to the NA-filled area in towns_r. These are the raster cells where no point in towns falls. The two colored pixels are the two cells that have been assigned with values. The light blue pixel is the one assigned with the value of 1 (corresponding to Lahav Kibbutz), while the brown pixel is the one assigned with 2 (Lehavim).

Note

Two other useful parameters of rasterize are field and fun.

Using field, we can override the default assignment of raster values and provide a single number, vector, or the name of an attribute table column determining the values (see Chapter 8, Spatial Interpolation of Point Data, for an example of the latter). For example, using the rasterize(towns,r,field=c(3,4)) expression will yield a raster with the value of 3 for Lahav Kibbutz and 4 for Lehavim.

The fun parameter determines the method to assign the raster values, and is only relevant when some raster cells overlap with more than one feature. It can be provided either with a function or one of the predefined character values: "first", "last", "count", "sum", "min", or "max" (the default value is "last"). For example, the rasterize(towns,r,fun="count") expression yields a raster stating how many towns are in each of the 500 meter cells (in our case, this is not very instructive—the raster will have two 1 values because there is only one town in each of the two individual cells).

Masking values in a raster

As mentioned in the previous chapter, a raster is always rectangular. However, in raster subsetting, we are often interested in going beyond the selection of rectangular extents. Non-rectangular rasters can be created by assigning all cells, excluding those we are interested in, with NA. This operation is called masking, again in the raster package terminology.

Masking is most often performed using a polygonal layer defining an area of interest. Therefore, conceptually, masking can be viewed as a two-step operation. The first step consists of a vector-to-raster conversion, where the area of interest is rasterized according to the raster we would like to mask. The second step consists of an overlay to construct the masked raster, with NA in those cells where the area-of-interest raster has NA or the original value otherwise. In practice, the operation may be performed with a single step, using the mask function.

In the following example, we will mask the Haifa slope raster from the previous chapter to create two new rasters—first masking all areas other than those coinciding with buildings (the haifa_buildings polygonal layer), and then all areas other than natural areas (the haifa_natural polygonal layer). The latter two layers should be in the same CRS of slope; see Chapters 5, Working with Points, Lines, and Polygons, and Chapter 6, Modifying Rasters and Analyzing Raster Time Series, to learn how they were created.

Since we would like to focus on the Haifa area, we will first create an Extent object encompassing haifa_buildings and a 2-kilometer buffer. Later, we will use this object (named haifa_ext) to clip our results and display them more conveniently:

> haifa_ext = extent(haifa_buildings) + 2000

Before proceeding, let's review the layers involved—slope, haifa_buildings, and haifa_natural—by plotting them as follows (zooming in to haifa_ext):

> plot(slope, ext = haifa_ext)
> plot(haifa_buildings, add = TRUE)
> plot(haifa_natural, col = "lightgreen", add = TRUE)

The resulting graphical output is shown in the following screenshot:

Masking values in a raster

The preceding graphical output is familiar from Chapter 5, Working with Points, Lines, and Polygons (see the last screenshot in that chapter); the only difference is that now the slope raster appears in the background. The area appears to be characterized by variable topography. Are the natural and built areas characterized by different topographic slopes? This question motivates our next task—subsetting the slope pixels covered by natural areas and buildings, separately, to compare their value distributions.

The mask function that we will use to do this task expects two main arguments:

  • The raster to be masked (x)
  • The object determining which values to mask (mask)

The mask argument can either be an overlapping raster (in which case the values in x corresponding to NA in mask are assigned with NA) or a vector layer (in which case the values in x not coinciding with any feature in mask are assigned with NA). Therefore, the following expression yields a new raster based on slope where all pixels not covered by haifa_natural are masked (that is, assigned with NA):

> natural_mask = mask(slope, haifa_natural)

Note

The previous expression is analogous to the following expression:

> natural_mask = mask(slope, 
+ rasterize(haifa_natural, slope))

For convenience, we will crop the result, natural_mask, using haifa_ext:

> natural_mask = crop(natural_mask, haifa_ext)

We will repeat the exact same procedure with haifa_buildings to get the buildings_mask raster as well:

> buildings_mask = mask(slope, haifa_buildings)
> buildings_mask = crop(buildings_mask, haifa_ext)

Now let's plot both natural_mask and buildings_mask, side-by-side, to observe how masking has been carried out:

> plot(stack(natural_mask, buildings_mask))

The resulting graphical output is shown in the following screenshot:

Masking values in a raster

After observing the two results, we can see that while natural_mask (the left panel) mostly consists of continuous patches of non-NA areas, buildings_mask (the right panel) is composed of very small non-NA patches containing a few pixels. The reason for such behavior is that masking with a polygonal layer retains the values of only those cells whose cell center falls within a polygon. This behavior is appropriate for haifa_natural, which is mainly composed of large polygons, each one encompassing many cells. However, for haifa_buildings, the pixels that are retained are only those whose center falls within either one of the building polygons in haifa_buildings. This clearly underestimates the built area. A simple solution would be to mask using building centroids instead, in which case those pixels where a centroid of haifa_buildings falls will be retained. For this purpose, we will create a point layer to build centroids named buildings_ctr:

> buildings_ctr = gCentroid(haifa_buildings, byid = TRUE)

Now, we will repeat the masking procedure using this layer:

> buildings_mask = mask(slope, buildings_ctr)
> buildings_mask = crop(buildings_mask, haifa_ext)

Let's plot the result once again to see the difference:

> plot(stack(natural_mask, buildings_mask))

The graphical output is shown in the following screenshot:

Masking values in a raster

This time, many more pixels remained unmasked in buildings_mask since all pixels coinciding with a centroid of at least one building were retained (as in the previous example of towns rasterization).

We will proceed with this example in Chapter 9, Advanced Visualization of Spatial Data, displaying the value distribution of both rasters with histograms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.52.188