Chapter 5. Working with Points, Lines, and Polygons

In this chapter, we will cover the basic usage of the second major type of spatial data—vector layers (points, lines, and polygons). In GIS terminology, these data are sometimes referred to as vectors, but we will use the term vector layers to distinguish them from vectors in R (see Chapter 2, Working with Vectors and Time Series). We will review the architecture of the vector layer classes defined in the sp package. Examples of the most common operations involving vector layers will then be presented using the sp, rgdal, and rgeos packages.

In this chapter, we'll cover the following topics:

  • Classes for spatial vector layers (points, lines, and polygons) in the sp package
  • Creating point layers by geocoding
  • Reading and writing vector layer files
  • Exploring the properties of vector layers
  • Accessing and modifying attribute tables
  • Reprojecting vector layers
  • Calculating derived geometrical properties (for example, polygon area)
  • Querying relations (for example, intersection) between a pair of layers
  • Creating new layers based on a single layer or a pair of layers
  • Joining data to an attribute table based on the location or common columns

Data structures for vector layers in R

Spatial vector layers have two components: the geometry and the attribute table. The geometry component holds the spatial coordinates and information regarding their arrangement in separate features, while the attribute table holds additional information regarding each feature. For example, in a point layer of capital cities, the record for London may be composed of a geometric component (a point coordinate, such as 51.5072°N, 0.1275°W) and a row in an attribute table holding additional data regarding each city (for example, population size, built area, and so on).

The geometry part in a vector layer is obligatory and there are three types of geometries: points, lines, and polygons. The attribute table is optional. Classes for the six spatial vector layers types, constituting all possible combinations of these two properties, have been defined in the sp package. They are summarized in the following table:

Geometry type

Attribute table

Class

Points

No

SpatialPoints

Yes

SpatialPointsDataFrame

Lines

No

SpatialLines

Yes

SpatialLinesDataFrame

Polygons

No

SpatialPolygons

Yes

SpatialPolygonsDataFrame

Together with the three raster classes from the raster package (presented in the previous chapter), these constitute a commonly used set of classes suitable for a wide range of spatial analysis applications in R.

Note

Additional classes for spatial vector layers (and rasters) do exist in R. They are often associated with packages intended for specialized data analysis tasks. Conveniently, however, there are usually established methods of converting objects to and from such classes, with respect to the highly popular classes in the sp and raster packages (which we've used throughout this book). For example, the ppp class (the spatstat package) is used to represent a point pattern (event locations and the window where the pattern has been observed) to employ statistical analysis tools for such patterns. A SpatialPoints object can be converted to a ppp object and vice versa using the maptools package.

In this section, we will see an example of how to create a vector layer for each of the three types of geometries—points, lines, and polygons. In the next section, methods to examine spatial vector layers will be presented, while in the last three sections of this chapter, we will see how some more advanced operations on vector layers are performed. Additional information regarding the sp package and its spatial vector classes can be found in the book Applied Spatial Data Analysis with R, Springer, whose second edition was published in 2013, by Roger Bivand, Edzer Pebesma, and Virgilio Gómez-Rubio.

Points

Points are the simplest type of spatial objects since the geometrical component of a point is just a single (x,y) coordinate. A set of (x,y) coordinates, along with CRS information, constitutes a spatial point layer, which can be represented in R with a SpatialPoints object. If we also have an attribute table where each row corresponds to a single point, the layer can be represented with a SpatialPointsDataFrame object.

In our first example, we are going to create a SpatialPointsDataFrame object by geocoding, the procedure to convert addresses to geographic coordinates. The intermediate steps we will go through are as follows:

  1. Creating a vector of addresses we want to geocode.
  2. Geocoding the addresses to get a data.frame object of geographic coordinates for each address.
  3. Adding location names as an additional column.
  4. Converting the data.frame object to a SpatialPointsDataFrame object.
  5. Adding CRS information.

In the example, we will geocode the addresses of three airports in New Mexico:

  • Albuquerque International Airport
  • Double Eagle II Airport
  • Santa Fe Municipal Airport

For the geocoding step (step 2), we are going to use the Google Maps API, accessible from R through the ggmap package. Additional functionality of the ggmap package for visualization of spatial data will be presented in Chapter 9, Advanced Visualization of Spatial Data. For an overview of package capabilities, refer to the introductory paper on ggmap (ggmap: Spatial Visualization with ggplot2) by Kahle, D. and Wickham, H. 2013.

Our first step is to create a character vector with the airports addresses. The addresses are ordered in accordance with the airport names listed earlier:

> addresses = c(
+ "2200 Sunport Blvd, Albuquerque, NM 87106, USA",
+ "7401 Paseo Del Volcan Northwest Albuquerque, NM 87121, USA",
+ "121 Aviation Dr, Santa Fe, NM 87507, USA")

Note

Note that the Google Maps API (which we will use for geocoding) can also search for location names (such as the White House) or partial addresses, but the result will be less determinate (two different places can be go by the same name, but not the same address). The exact addresses are used here to ensure that the reader will obtain the same results as in the book's text.

We are ready for the second step and will now load the ggmap package and geocode the addresses using the geocode function. This function accepts a vector of addresses (such as addresses that we just created) and returns a data.frame object of the matched geographic coordinates:

> library(ggmap)
> airports = geocode(addresses)
> airports
        lon      lat
1 -106.6168 35.04918
2 -106.7947 35.15559
3 -106.0731 35.62866

Note

The geocode function accesses the Google Maps API. Therefore, an Internet connection is required. Using the Google Maps API also implies agreement with the Google Maps API Terms of Service (https://developers.google.com/maps/terms) and is limited to 2,500 queries in a 24-hour time period (using the free service option). A promising alternative can be found in the geocodeHERE package providing access to the HERE Geocoding API (https://developer.here.com/geocoder) that has a higher threshold of 10,000 queries per day.

We now have a data.frame object named airports that holds the longitudes (lon) and latitudes (lat) of the three airports.

In our third step, we will add the airport names in an additional column called name in airports as follows:

> airports$name = c("Albuquerque International",
+ "Double Eagle II",
+ "Santa Fe Municipal")
> airports
        lon      lat                      name
1 -106.6168 35.04918 Albuquerque International
2 -106.7947 35.15559           Double Eagle II
3 -106.0731 35.62866        Santa Fe Municipal

The resulting data.frame object has everything we need to create a SpatialPointsDataFrame object, where the geometry components will be the airports' coordinates, and the attribute table will have a single column that holds the airport names. This is our fourth step. It is performed using the coordinates function of the sp package by specifying the columns that hold the coordinates with a formula object of the structure ~x_coord+y_coord. In our case, the x coordinate is the longitude (lon) and the y coordinate is the latitude (lat), therefore the formula takes the following form:

> library(sp)
> coordinates(airports) = ~ lon + lat

The last expression converts airports from a data.frame object to a SpatialPointsDataFrame object. We can confirm the conversion took place using the class function as follows:

> class(airports)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"

Using the print method on the object shows that it holds exactly the same information as before; just that the lon and lat columns are now identified as spatial coordinates, as shown in the following output:

> airports
            coordinates                      name
1 (-106.6168, 35.04918) Albuquerque International
2 (-106.7947, 35.15559)           Double Eagle II
3 (-106.0731, 35.62866)        Santa Fe Municipal

Our last step is to specify the CRS of airports, which is a geographic CRS:

> proj4string(airports) = CRS("+proj=longlat +datum=WGS84")

The airports vector layer is now complete.

Writing of Spatial*DataFrame objects (that is, the SpatialPointsDataFrame, SpatialLinesDataFrame, and SpatialPolygonsDataFrame objects) can be done with the writeOGR function from the rgdal package. In the following example, as well as in most other examples in this book, we will use the popular ESRI Shapefile format for the input and output of vector layers.

We can export our SpatialPointsDataFrame object airports to a Shapefile on the disk as follows:

> library(rgdal)
> writeOGR(airports, "C:\Data", "airports", "ESRI Shapefile")

The four arguments that were supplied to the first four parameters of the writeOGR function are as follows:

  • obj: The name of the object to be written (for example, airports)
  • dsn: The path of the directory where the file(s) will be written (for example, "C:\Data")
  • layer: The layer name (for example, "airports")
  • driver: The driver name (for example, "ESRI Shapefile")

Note

Since Shapefile datasets are composed of several separate files (at least three: *.shp, *.shx, and *.dbf), when reading and writing with the ESRI Shapefile driver, we specify the directory (dsn) and filename without the extension (layer). In the previous example, with dsn="C:\Data" and layer="airports", four files were written to the C:Data directory: airports.dbf, airports.prj, airports.shp, and airports.shx. The meaning of the dsn and layer parameters can be different with other drivers (see the next section and ?writeOGR).

To read the file we just wrote back into a SpatialPointsDataFrame object in R, we can use the readOGR function (also available in the rgdal package). When reading vector layers, we need to provide only the dsn and layer arguments (the path and layer name, respectively). In addition, we can specify that we do not want character values in the attribute table to be converted to factors by specifying stringsAsFactors=FALSE (see Chapter 3, Working with Tables):

> airports = readOGR("C:\Data",
+ "airports",
+ stringsAsFactors = FALSE)

Obviously, all the airports.* files that constitute the Shapefile need to be found in the same directory (in this case, in the C:Data directory) for the layer to be properly read.

Lines

Line and polygon layers are more complex than point layers in two principal aspects. First, individual lines or polygons are defined with a set of points, rather than a single one. Specifically, a line is a set of points connected to each other from the first point to the last one. A polygon is defined similarly; just that the last point is equal to the first. The number of points in each line or polygon is not fixed, but dependable on the shape complexity (a more complex shape will be defined with more points). Secondly, a single line or polygon feature (corresponding to a single entry in an attribute table) can be composed of more than one individual line or more than one individual polygon. For example, in a layer of countries' boundaries, the USA feature will be composed of more than one polygon since the contiguous United States, Alaska, and each of the Hawaii Islands are separate from one another.

Note

An additional complication, specific to polygonal layers, is the existence of holes. A single feature can be composed of an external boundary polygon and hole polygons. For example, in a layer of North America's land area, hole polygons may represent the Great Lakes. Hole polygons can once again have internal polygons (such as islands inside the Great Lakes) and so on.

Due to their complexity, it is much less common to manually define line and polygon layers from raw coordinates. Instead, they are usually either imported from an external source (for example, reading a Shapefile) or created from another object (for example, contour lines created based on a DEM raster).

As an example for SpatialLinesDataFrame, we will read a GPS Exchange Format (GPX) file with a track record from a GPS device. With GPX files, unlike with Shapefiles, the dsn argument is the file itself, while the layer argument points to the data component we would like to read (such as "tracks" or "track_points"; see ?readOGR). The particular file GPS_log.gpx contains the GPS log, from driving around the Lahav and Dvira forests with the GPS device in recording mode. Let's read the file as follows:

> track = readOGR("C:\Data\GPS_log.gpx","tracks")

Using the class function, we can demonstrate that track is a SpatialLinesDataFrame object:

> class(track)
[1] "SpatialLinesDataFrame"
attr(,"package")
[1] "sp"

Polygons

As an example of polygonal data, we are going to read another external file; this time it's a Shapefile named USA_2_GADM_fips. The file is composed of polygons corresponding to the second level administrative division (counties) of the USA, with several attributes such as county names and Federal Information Processing Standards (FIPS) codes. We will assign the resulting SpatialPolygonsDataFrame to an object named county to use in the subsequent examples in this chapter:

> county = readOGR("C:\Data", "USA_2_GADM_fips",
+ stringsAsFactors = FALSE)

To summarize, so far we have witnessed two ways to create spatial vector layers in R: from a set of (x,y) coordinates (relevant to points) and from a file (relevant to points, lines, and polygons). A third way—deriving vector layers from raster data—will be presented in Chapter 7, Combining Vector and Raster Datasets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.171.212