Geometrical calculations on vector layers

In previous sections, we covered the querying of the immediately available properties of vector layers (for example, the CRS definition or attribute table), and the modification of vector layers involving only the attribute table component (for example, removing the attribute table or subsetting the layer according to it). In the next two sections, you will learn to examine and modify the geometrical component of vector layers. In this section, operations involving a single vector layer, such as reprojection and area calculation, will be covered. In the next section, we will deal with operations involving pairs of vector layers.

Reprojecting vector layers

Reprojection is the conversion of all the coordinates of a spatial object from one CRS to another. Note the distinction when specifying a CRS (which we previously did with airports), where only the CRS definition associated with the layer is modified, leaving the coordinates unaltered. The reprojection of a vector layer is done with the spTransform function from the rgdal package. The function accepts two arguments: the layer to be reprojected and the target CRS.

For example, the following expression transforms the county layer (currently defined in a geographical CRS) to the US National Atlas Equal Area projection:

> newProj = CRS("+proj=laea +lat_0=45 +lon_0=-100
+ +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs")
> county = spTransform(county, newProj)

Note that the preceding operation consisted of two steps. The first expression created an object named newProj of the class CRS by applying the CRS function on a PROJ.4 character string corresponding to the US National Atlas Equal Area projection. Second, the county layer has been reprojected to the newProj CRS using the spTransform function.

Note

The PROJ.4 strings can be obtained from other objects (see the previous chapter) or from databases such as http://www.spatialreference.org/. The PROJ.4 string used earlier, for example, was copied from http://spatialreference.org/ref/epsg/us-national-atlas-equal-area/proj4/.

We can evaluate the effect of reprojection by visualizing the new county layer with plot(county). The following screenshot shows the graphical output:

Reprojecting vector layers

Reprojection is often used in spatial data analysis since any operation involving multiple layers (such as an overlay or map production) requires all layers to be projected onto the same CRS. For example, in order for us to display the GPS track object (see the Lines section in this chapter) on top of the Landsat satellite image landsat_11_09_2003.tif (see the previous chapter), we first need to bring them into the same CRS. For that, we can either reproject the raster to the CRS of the vector layer (WGS84, in this case) or reproject the vector layer to the CRS of the raster (UTM Zone 36N, in this case). Unless we have a special reason to prefer the CRS of the vector layer(s), it is usually better to reproject the vector layers into the CRS of the raster and leave the raster unmodified; the reason is that raster reprojection also involves resampling and thus, potential modification of its values (see the next chapter).

To plot the GPS track on top of the Landsat image, we will first read the latter into a RasterBrick object named l_03:

> library(raster)
> l_03 = brick("C:\Data\landsat_11_09_2003.tif")

Then, we will reproject track, supplying the CRS parameters of l_03 to spTransform, with a single step this time:

> track = spTransform(track, CRS(proj4string(l_03)))

With the l_03 and track objects in the same CRS, we can now plot them one on top of the other using two function calls. In the second function call, we need to specify add=TRUE so that the second layer will be plotted on top of the first, in the same graphical window (two or more layers can be plotted this way).

Regarding the satellite image, rather than plotting the values of an individual band, we will produce a true color image, using the red, green, and blue bands (which correspond to bands 3, 2, and 1 in Landsat, respectively), using the plotRGB function. This is done by assigning the appropriate bands to the r (red), g (green), and b (blue) parameters:

> plotRGB(l_03, r = 3, g = 2, b = 1, stretch = "lin",
+ ext = extent(track) + 10000)
> plot(track, add = TRUE, col = "yellow")

The following screenshot shows the graphical output with the GPS route (in yellow) on top of the true-color Landsat image, which is generated as a result of the last two expressions:

Reprojecting vector layers

The additional plotRGB parameters we used, stretch and ext, specify the type of stretch and the required extent, respectively. Stretching is essentially a transformation from a raster value (which in this case is between 0 and 1) to an RGB color model value (between 0, which is the darkest, and 255, which is the brightest). The simplest option is lin, which specifies a linear stretch. Supplying an Extent object to the ext parameter allows us to zoom in and plot only a portion of the raster. In this case, we use the extent of the track layer, plus a 10 kilometer buffer on all sides, with the expression extent(track)+10000 (note that all distance-related calculations are in CRS units; in this case, meters). When adding the second layer (track), we use the col parameter to specify the required line color; in this case, "yellow".

Note

R has excellent capabilities to use colors and color gradients in the graphical output, which are mostly beyond the scope of this book. In short, there are three main methods to specify colors in R:

  • Color name: For example, "yellow"
  • Position on a color palette: For example, rainbow(12)[3], which gives the third color in a 12-color rainbow palette, which is a kind of yellow
  • Position within the RGB color model: For example, rgb(1,1,0), which returns the hex code "#FFFF00" that corresponds to pure yellow

For the purposes of this book, the first method, involving the predefined color names, will mostly be sufficient (a list of available colors can be obtained using the expression colors()). We are, in fact, also using color palettes although indirectly through graphical functions such as levelplot (see the previous chapter). We will see an example of how to directly use a color palette in Chapter 9, Advanced Visualization of Spatial Data.

Working with the geometrical properties of vector layers

Spatial objects have a wide range of properties related to their geometry; some are instantly available as part of the data structure itself (for example, the coordinates of points in a point layer); others are derivable via geometrical calculations (for example, the area sizes of polygons).

The coordinates of a point layer can be obtained using the coordinates function:

> coordinates(airports)
           lon      lat
[1,] -106.6168 35.04918
[2,] -106.7947 35.15559
[3,] -106.0731 35.62866

The result is a matrix object with the number of rows corresponding to the number of points the layer consists of.

To derive more complex properties, the rgeos package, which stands for R interface to Geometry Engine Open Source (GEOS), offers a range of functions for geometrical operations involving vector layers. The available geometrical operations can conceptually be divided into three groups according to the output they produce:

  • Numeric values: Obtained from functions that summarize geometrical properties (for example, calculating area sizes)
  • Logical values: Obtained from functions that evaluate whether a certain geometrical property (for example, whether the given geometry is valid), or the relation between objects (for example, whether feature A intersects with feature B), holds true
  • Spatial layers: Obtained from functions that create a new layer based on an input layer (for example, finding polygon centroids) or a pair of layers (for example, finding the intersecting area of feature A with feature B)

Several examples of functions for each type of these operations will be provided in this chapter, while some of the additional functions will only be mentioned for reference.

Note

For a complete list of functions that the rgeos package offers, refer to the help pages of the package available at http://cran.r-project.org/web/packages/rgeos/rgeos.pdf.

As an example of a function that returns numeric values, the gArea function can be used to calculate the area size of polygons. For example, we can calculate the area covered by the county polygons as follows:

> library(rgeos)
> gArea(county) / 1000^2
[1] 7784859

The area is given in the units of the projection, in this case m2; dividing the result by 10002 transformed the area figure to km2 units. According to Wikipedia, the land area of the contiguous U.S. is 7,663,942 km2, which is close enough to our result (given that the CRS and level of detail affect the calculation).

If we want to calculate the area of each feature (each county, in this case), rather than the area of the layer as a whole, we need to specify byid=TRUE. The byid parameter determines whether we wish to perform the calculation by ID, that is, for each feature separately. This parameter is present in many of the functions in the rgeos package with the same functionality, as we shall see in the subsequent examples. The following expression returns a numeric vector with the area of each feature in the county layer in km2. The vector is immediately assigned to a new column in the county layer, named area:

> county$area = gArea(county, byid = TRUE) / 1000^2

Now the attribute table of a county contains an extra column with the area size for each county. We can confirm this by printing the first few rows of the attribute table:

> head(county@data)
       NAME_1     NAME_2 TYPE_2  FIPS      area
0 Connecticut Litchfield County 09005  2451.876
1 Connecticut   Hartford County 09003  1941.110
2 Connecticut    Tolland County 09013  1077.789
3 Connecticut    Windham County 09015  1350.476
4  California   Siskiyou County 06093 16416.572
5  California  Del Norte County 06015  2626.707

As an example of an operation where a new spatial layer is created, we will dissolve the county polygons into state polygons. For simplicity, we will perform the dissolving on a subset of county, including only two states: Nevada and Utah. At first, we will create the subset and assign it to a new object named county_nv_ut:

> county_nv_ut = county[county$NAME_1 %in% c("Nevada", "Utah"), ]

Now, we will dissolve the county_nv_ut polygons using the gUnaryUnion function. The two arguments transferred to this function are the layer to be dissolved and the ID, a vector defining the features that should be aggregated (all features with identical levels will be dissolved into one). If the id argument is omitted, all polygons are dissolved into one, as we shall see in subsequent examples. Here is the code for dissolving our current layer:

> states = gUnaryUnion(county_nv_ut, id = county_nv_ut$NAME_1)

In the present case, the state name column (NAME_1) was passed as the ID and thus, all the counties that form a single state were dissolved into state polygons. Since rgeos deals with the geometrical component of vector layers, the returned object of gUnaryUnion (and, as we shall see, of other functions in this package) has no attribute table. In this case, for example, while the input county_nv_ut was a SpatialPolygonsDataFrame object, the output states is a SpatialPolygons object. Then, how will we be able to tell which polygon corresponds to which state? The answer is that the information is recorded in the ID codes of the resulting layer and can be obtained using the row.names function. Using this function, we can find out that the first feature in states corresponds to "Nevada" and the second to "Utah":

> row.names(states)
[1] "Nevada" "Utah"

To get a better understanding of what we just did, it would be helpful to visualize the dissolved states' polygons on top of the original county_nv_ut polygons. We can produce a simple plot using two plot function calls (specifying add=TRUE the second time):

> plot(county_nv_ut, border = "lightgrey", lty = "dotted")
> plot(states, add = TRUE)

Note that the border parameter of plot is used to indicate the polygon border color (rather than col, which, in the case of polygons, refers to fill color). An additional argument is lty (which stands for the line type), which specifies that we want the county borders to be dotted.

Note

There are six line types available in R. See the full list at the entry concerning the lty parameter on the ?par help page.

The resulting output is not presented since we are not done just yet. An additional layer in our plot is going to consist of labels for county names. Text labels can be added to an image created with a plot using the text function. With text, we need to supply a set of coordinates defining where the labels will be plotted (for example, using a matrix object with two columns, for x and y), and the text to be written at each coordinate (for example, using a character vector).

The most straightforward option would be to place the labels at the centroids of each county. For this, we first have to find the centroid coordinates using yet another function from rgeos that returns a new layer based on a single input layer called gCentroid:

> county_ctr = gCentroid(county_nv_ut, byid = TRUE)

The resulting SpatialPoints object was assigned to county_ctr. Since byid=TRUE was specified, the layer contains the centroids of the individual counties, rather than the centroid of the whole county_nv_ut layer.

We can supply county_ctr along with the vector of labels (which we get from the NAME_2 column of county_nv_ut) to the text function. The additional parameter cex defines the labels font size in relative units (1.5 times the default size):

> text(county_ctr, county_nv_ut$NAME_2, cex = 1.5)

The final graphical output, produced by the two plot functions and one text function call, is shown in the following screenshot:

Working with the geometrical properties of vector layers

The plot shows the dissolved state polygons (in black), the original county polygons (in dotted gray), and county names (as text labels).

Note

Many of the functions in the rgeos package cannot handle geometries that are invalid from the topological point of view. For example, when referring to polygons, a valid layer does not contain self-intersecting polygons (polygons whose boundary crosses itself). Examining whether a given layer is valid or not can be done using the gIsValid function, which returns TRUE for valid features (either for the layer as a whole, by default, or for each feature separately when specifying byid=TRUE). Searching for and resolving topological errors, however, is best done interactively. Thus, GIS software (such as QGIS) is more suitable for this task than R.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.88.62