Spatial relations between vector layers

In this section, you will learn how to perform operations involving pairs of vector layers. These types of operations are very common in spatial data analysis. We often want to know, for instance:

  • What are the distances of different resorts from the nearest coastline?
  • Which houses are within a radius x of the epicenter of an earthquake?
  • Which parts of the habitat of an endangered species are contained within protected nature reserves?

All of these operations require the overlay of features from two distinct layers, although, as we have seen earlier, the result can be:

  • A numeric value (the distance from a resort to the nearest coastline is 50 meters)
  • A logical value (the house is within a distance of 10 kilometers of the earthquake epicenter)
  • A spatial layer (a polygon defining the intersecting area between the natural distribution of the endangered species and nature reserves)

In this section, we will see examples of all three kinds of operations.

Querying relations between vector layers

Querying relations between two layers is required when we would like to do one of the following:

  • Assign data from the attribute table of one feature to another (for example, polygon A gets the attribute table entry of polygon B it intersects with)
  • Examine whether a specific relation exists between features (for example, we get TRUE if polygon A is completely within polygon B, or FALSE otherwise)

In GIS terminology, the first type of operation is sometimes referred to as a spatial join or a join by spatial location. The way it can be done in R will be demonstrated using the airports and county layers. The second type of operation (examining relations) will be demonstrated using another example, involving layers of buildings and natural areas in London.

The over function provides consistent functionality to join the attribute table data from one spatial object to another based on their intersection. All nine possible types of relations are permitted (point/line/polygon with point/line/polygon), either via the sp package (point-point, point-polygon, and polygon-point) or rgeos (all other combinations). The over function accepts two spatial layers (parameters x and y). The function call over(x,y) then returns—for all features in the first layer (x)—the attribute table entries (or indices, if the layer has no attribute table) of the second layer (y) that intersect it. The related syntax x[y,], when both x and y are vector layers, serves as a shortcut to over when we are interested in subsetting the vector layer x according to intersection. The latter function call returns only those features in x that intersect with a feature in y.

As already mentioned, our first example of querying relations between layers will involve the airports and county layers. The first preliminary step, as always, is to bring both layers to a common CRS. In this case, we will reproject airports to the CRS of county:

> airports = spTransform(airports, CRS(proj4string(county)))

We will now plot airports on top of the county layer to visually examine their relation so that we know what to expect later. Since we already know that all three airports are located in New Mexico, we will work with a subset of county, called nm, containing only the counties of New Mexico:

> nm = county[county$NAME_1 == "New Mexico", ]
> plot(nm)
> plot(airports, col = "red", pch = 16, add = TRUE)

Note that we used yet another parameter of the plot function, pch, to choose a different point shape (16 corresponds to the filled circles, while the default argument is a plus symbol +).

Note

There are 26 point shapes available. The possible shapes and their code are listed in the ?points help page.

The following screenshot shows the graphical output that is produced:

Querying relations between vector layers

We can see that two of the airports fall within (thus, by definition, also intersect) a single county, while the third airports falls within a different county. Using over with airports and nm (in that order) will return the following output:

> over(airports, nm)
      NAME_1     NAME_2 TYPE_2  FIPS     area
1 New Mexico Bernalillo County 35001 3023.909
2 New Mexico Bernalillo County 35001 3023.909
3 New Mexico   Santa Fe County 35049 4944.339

Indeed, we see that the first two rows are identical since the first two airports are located in the same county (Bernalillo county). To examine specifically which airport falls within each county, we can bind the result of over with the attribute table of airports:

> cbind(airports@data, over(airports, nm))
                       name     NAME_1     NAME_2 TYPE_2  FIPS
1 Albuquerque International New Mexico Bernalillo County 35001
2           Double Eagle II New Mexico Bernalillo County 35001
3        Santa Fe Municipal New Mexico   Santa Fe County 35049
      area
1 3023.909
2 3023.909
3 4944.339

Now we can tell that the Albuquerque International and Double Eagle II airports are located within Bernalillo County, while the Santa Fe Municipal airport is located within Santa Fe County. To permanently incorporate the county information into the attribute table of airports, we can assign the combined table back to the attribute table slot of airports, with an expression such as airports@data=cbind(airports@data,over(airports,nm)).

Examining the opposite relation (when nm is x and airports is y), we can, for example, subset those counties that intersect with at least one airport using the expression nm[airports,]. As noted earlier, using [ with two vector layers is in fact a shortcut used to retain those features of x that intersect with y. In this case, an equivalent over expression would be nm[!is.na(over(nm,airports)$name),], but using nm[airports,] is obviously more convenient.

We can plot the nm[airports,] subset to demonstrate the behavior:

> plot(nm[airports, ])
> plot(airports, add = TRUE, col = "red", pch = 16, cex = 1.5)
> text(airports, airports$name, pos = 1)

The additional pos parameter of the text function controls the position of the text with respect to the point coordinates (the default behavior we previously witnessed is to place the text centered on the coordinates point itself, with pos=1 the text is placed below the point; see ?text for all options). The following screenshot shows the graphical output that is produced as a result of the three function calls:

Querying relations between vector layers

We can see that nm[airports,] indeed consists of only those nm features that intersect airports.

Note

If we try to match the airports attribute table entries to nm (with over(nm,airports)), only the first airport that intersects each county will be returned in cases where there are multiple matches. If we wish to preserve all matches, we need to specify returnList=TRUE in the over function call. However, the returned object will be a list object since holding sets of elements with variable lengths requires a list rather than a data.frame object. Many of the more advanced uses of R that require such flexibility involve list objects. However, the subject is beyond the scope of this book. More information on lists can be found in the official R's introduction document at http://cran.r-project.org/doc/manuals/r-release/R-intro.pdf and in most introductory books on R.

As an example of querying polygon-polygon relations, we will use another example involving three polygonal Shapefiles:

  • Administrative areas of England and Wales (CTYUA_DEC_2013_EW_BFE.shp)
  • Buildings in Greater London (london_buildings.shp)
  • Natural areas in Greater London (london_natural.shp)

Note

The datasets were downloaded from freely accessible online resources, either from the Office of National Statistics at https://geoportal.statistics.gov.uk/ (administrative areas) or OpenStreetMap (buildings and natural areas).

First, we will read the files into R and bring them to a common CRS. We will begin with the administrative areas layer, reading it from the disk and naming it boundary:

> boundary = readOGR("C:\Data", "CTYUA_DEC_2013_EW_BFE")

Using the proj4string function reveals the CRS of boundary—the British National Grid. Note that the resulting string is abbreviated in the following output since it does not fit in a single line:

> proj4string(boundary)
[1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 $

Next, we will read the two OpenStreetMap layers:

> buildings = readOGR("C:\Data", "london_buildings")
> natural = readOGR("C:\Data", "london_natural")

Both these layers are defined in a geographical CRS, as the following output demonstrates:

> proj4string(buildings)
[1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,$
> proj4string(natural)
[1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,$

We will reproject the buildings and natural layers to the CRS of boundary:

> buildings = spTransform(buildings, CRS(proj4string(boundary)))
> natural = spTransform(natural, CRS(proj4string(boundary)))

Now that the preliminary preparations are complete, we can continue with the exercise. Our goal will be to assign the distance to the River Thames to each of the buildings within the City of London. For this, we will go through the following intermediate steps:

  1. Create a subset of buildings with only those in the City of London.
  2. Create a subset of natural with only the riverbanks.
  3. Dissolve the riverbank features into a single one.
  4. Calculate the distance between each feature in buildings and the riverbank.

The administrative boundaries layer comes with an explanatory file (Product Specification for CTYUA_2013_EW_BFE.docx, which is also provided on the book's website), where we can find out that the names of the different administrative areas are defined in the CTYUA13NM column of the attribute table of this layer. Using this column, we will create a subset, named city, containing only the polygon defining the city of the London administrative area:

> city = boundary[boundary$CTYUA13NM == "City of London", ]

Using the city polygon, we will create a subset of only those buildings that are within the City of London. If we were interested in intersection (in other words, buildings that are completely or incompletely within the city polygon), we could use the [ or over methods as already shown earlier. To evaluate other types of relationships, the rgeos package provides a dozen additional functions: gContains, gContainsProperly, gCovers, gCoveredBy, gCrosses, gDisjoint, gEquals, gEqualsExact, gIntersects, gTouches, gOverlaps, and gWithin. The names of these functions already provide a clue for the relationship they are used to evaluate; the exact definitions are provided in the respective help page of each function.

Since in our example we are interested in creating a subset of the buildings features that are contained within city, we are going to use the function that evaluates containment—gContains. According to the help page of gContains, the expression gContains(city,buildings) will return TRUE for buildings features that are contained within city. Specifying byid=TRUE is necessary to evaluate the relation separately for each feature:

> in_city = gContains(city, buildings, byid = TRUE)

The result, in_city, is a two-dimensional matrix with rows corresponding to the buildings features (210,937 buildings in Greater London) and columns corresponding to the city features (there is only one, the City of London; if we had more than one feature in city, we would have had more columns):

> class(in_city)
[1] "matrix"
> head(in_city)
    119
0 FALSE
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE

The first (and only) column of this matrix is a logical vector specifying the containment status of each building within the city polygon. Using this vector as a rows index for buildings creates a subset of only those buildings within city:

> buildings = buildings[in_city[, 1], ]

The first step is now complete. Returning to the natural layer, we will now subset the polygons of type "riverbank" (which in the vicinity of the City of London corresponds to the River Thames):

> river = natural[natural$type == "riverbank", ]

As the third step, we will dissolve the separate river segments since we will be interested in the shortest distance to any section of the river, rather than the distances to its specific parts. As previously shown in the county example, dissolving can be achieved with gUnaryUnion. This time, the id parameter is left unspecified so that all geometries will be dissolved into a single one:

> river = gUnaryUnion(river)

Before continuing with the fourth step (calculating distances, which we'll see later in this chapter), we will visually review the processed buildings and river layers we have at this point, with respect to the administrative boundaries layer boundary:

> plot(buildings, col = "sandybrown")
> plot(river, col = "lightblue", add = TRUE)
> plot(boundary, border = "dimgrey", add = TRUE)

The resulting graphical output is shown in the following screenshot:

Querying relations between vector layers

In this preceding screenshot, we see the buildings of the City of London (in brown), the dissolved riverbanks polygon (in blue), and the administrative areas boundaries (in gray).

Creating new geometries

The rgeos package provides four functions to create new layers based on a pair of existing ones: gDifference, gIntersection, gSymdifference, and gUnion. The usage of these functions is very similar to that of functions to query relationships since their main parameters are also a pair of layers and the byid parameter. The difference is that they do not return logical values or matched attribute table entries (based on whether the examined relationship holds), but rather a new layer. The following diagram demonstrates how new geometries are generated in each case:

Creating new geometries

Note

Note that the gUnion function operates in a similar way to gUnaryUnion—just that the gUnioun function dissolves geometries from two different layers, while the gUnaryUnion function operates upon the geometries of a single layer.

Our next example will utilize two of these functions: gIntersection and gDifference. We will also use three new layers: buildings, natural areas, and administrative borders in Haifa. The buildings and natural areas layers originate from OpenStreetMap data, the same way as in the London example, while the administrative borders of Israel will be downloaded from a global administrative borders dataset directly through R. The buildings and natural areas layers will be named haifa_buildings and haifa_natural in order to not be confused with the analogous objects buildings and natural from the London example.

Our goal will be to create a polygon encompassing the natural areas in the vicinity of the buildings in Haifa, excluding those natural areas that are within 50 meters of the nearest building. We will follow four intermediate steps:

  1. Read the layers into R and bring them to a common CRS.
  2. Create a bounding polygon (convex hull) encompassing the buildings.
  3. Clip the natural areas according to the bounding polygon.
  4. Remove the natural areas that are 50 meters away from buildings.

As a first step, we will read the Haifa buildings and natural areas layers:

> haifa_buildings = readOGR("C:\Data", "haifa_buildings")
> haifa_natural = readOGR("C:\Data", "haifa_natural")

The third layer involved, the administrative boundaries, will be downloaded from the GADM database of Global Administrative Areas at http://www.gadm.org/, which is accessible using the getData function from the raster package:

> israel_adm = getData("GADM", country = "ISR", level = 1)

The hereby used arguments of getData are as follows:

  • name: The dataset name (for example, "GADM", which stands for the GADM dataset; using another dataset, "SRTM", will be demonstrated in the next chapter)
  • country (relevant for the name="GADM" option): The country ISO3 code (a list of country codes can be obtained with getData("ISO3"))
  • level (relevant for the name="GADM" option): The level of administrative subdivision (0: country, 1: first subdivision, and so on)

In fact, the county layer we used earlier comes from the GADM dataset as well (only that FIPS codes have been added to its attribute table). The reason GADM was not used in the London example is that it is less accurate than the Office of National Statistics layer.

We will not need the whole israel_adm layer, but only a subset consisting of the Haifa administrative area, which includes the city of Haifa:

> haifa_adm = israel_adm[israel_adm$NAME_1 == "Haifa", ]

Before proceeding with geometrical calculations, as usual, all three layers (haifa_adm, haifa_buildings, and haifa_natural) will be reprojected to the same CRS. In this case, we are going to use the UTM Zone 36N CRS. We can obtain its parameters from the Landsat image object l_03 we read into R in one of the previous examples:

> haifa_adm = 
+spTransform(haifa_adm, CRS(proj4string(l_03)))
> haifa_buildings =
+ spTransform(haifa_buildings, CRS(proj4string(l_03)))
> haifa_natural =
+ spTransform(haifa_natural, CRS(proj4string(l_03)))

Having completed the first step, we will take a moment to plot the three layers and see what they look like:

> plot(haifa_natural, col = "lightgreen")
> plot(haifa_buildings, add = TRUE)
> plot(haifa_adm, add = TRUE)

The resulting graphical output (in the following screenshot) shows the Haifa administrative area border (haifa_adm, in this case marking the Mediterranean sea coastline), the Haifa buildings (haifa_buildings), and the natural areas (haifa_natural, shown in green):

Creating new geometries

Proceeding with the second step, we will create a convex hull polygon, assigned to buildings_ch, in order to define our area of interest surrounding the buildings. A convex hull is the smallest convex polygon encompassing a certain set of features (see the gray polygon in the next screenshot). A convex hull can be created using the gConvexHull function:

> buildings_ch = gConvexHull(haifa_buildings)

The convex hull crosses the Mediterranean sea. We would like to, however, retain only those areas of buildings_ch that are within haifa_adm (in other words, on land). This can be achieved by using the gIntersection function on buildings_ch and haifa_adm:

> buildings_ch = gIntersection(buildings_ch, haifa_adm)

Now that the bounding polygon buildings_ch is set, we proceed to our third step. Turning to the haifa_natural layer, we will merge all of its polygons into one polygon (since we are not interested in discerning different types of natural areas) using gUnaryUnion, similarly to what we did in the London example:

> haifa_natural = gUnaryUnion(haifa_natural)

Then, we will use buildings_ch to retain only those natural areas that are within our area of interest, using another gIntersection function call:

> haifa_natural = gIntersection(haifa_natural, buildings_ch)

What remains to be done is our fourth step, which is removing the areas in haifa_natural that are within 50 meters of the nearest building. To do this, we will first create a 50 meter buffer polygon surrounding haifa_buildings, using the gBuffer function (specifying the buffer size with width):

> buildings_50m = gBuffer(haifa_buildings, width = 50)

Then, using the gDifference function, we will calculate the area in haifa_natural that is not within the 50 meters buffer of haifa_buildings:

> haifa_natural = gDifference(haifa_natural, buildings_50m)

We are done. To see the resulting layers, we will plot all four of them (buildings_ch, haifa_adm, haifa_natural, and haifa_buildings) together, with the following series of plot function calls:

> plot(buildings_ch, col = "lightgrey", border = "lightgrey")
> plot(haifa_adm, add = TRUE)
> plot(haifa_natural, col = "lightgreen", add = TRUE)
> plot(haifa_buildings, add = TRUE)

The resulting graphical output is shown in the following screenshot:

Creating new geometries

In the preceding screenshot, we can see the bounding polygon buildings_ch in gray, the administrative borders, as well as the buildings, in black, and natural areas (those within the bounding polygon and excluding areas within 50 meters of buildings) in green. We will continue working with the Haifa layers we have hereby created in several additional examples in subsequent chapters.

Calculating distances between geometries

Let's now return to the London example, to complete its fourth step (which is distance calculation). Distances between each feature from one layer to each feature in a second layer can be calculated with the gDistance function, setting byid to TRUE. The following expression calculates the distance from each feature in buildings to each feature in river:

> dist = gDistance(buildings, river, byid = TRUE)

As seen in the gContains example, the result is a matrix:

> class(dist)
[1] "matrix"

The matrix rows correspond to river features, whereas its columns correspond to buildings features. As the following output of dim demonstrates, we have 1,583 buildings features in the City of London and one river feature (remember that we dissolved the separate riverbank parts into one):

> dim(dist)
[1]    1 1583

Selecting the first (and only) row of dist will yield a numeric vector with the distances of each building to the nearest riverbank. With the following expression, we can assign the distances to a new column, named dist_river, in the attribute table of buildings:

> buildings$dist_river = dist[1, ]

Examining the attribute table will demonstrate that, indeed, we now have a distance-to-river entry for each of the buildings in the City of London:

> head(buildings@data)
     osm_id                    name             type dist_river
16  4076420               St Brides place_of_worship   313.1239
56  4364085 Sainsbury's Head Office            block   683.5640
137 4959489          30 St Mary Axe       attraction   653.4159
138 4959510         Bank of England           office   503.7244
139 4959544     St Paul's Cathedral        cathedral   287.7122
140 4959629        Liverpool Street    train_station  1009.8070

We are going to wait until Chapter 9, Advanced Visualization of Spatial Data, to graphically display this result while learning some additional visualization methods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.226.240