Exploring vector layer properties and subsetting

This section is going to be devoted to the examination of spatial vector layer properties, and to subsetting them based on their attribute tables. Some of the presented procedures will be analogous to those presented for rasters in the previous chapter (for example, plotting and querying CRS information), while others are generally relevant only to vector layers (for example, calculating areas and creating subsets according to the attribute table). As will quickly become apparent, many operations involving attribute tables of vector layers are conveniently analogous to operations on data.frame objects.

Examining vector layer properties

The summary function produces a useful textual summary of the properties of a vector layer, including its class, bounding box coordinates, CRS, and attribute table column types. For example, using summary on airports produces the following textual output:

> summary(airports)
Object of class SpatialPointsDataFrame
Coordinates:
                 min        max
lon -106.79467 -106.07308
lat   35.04918   35.62866
Is projected: FALSE 
proj4string : [+proj=longlat +datum=WGS84]
Number of points: 3
Data attributes:
   Length     Class      Mode
        3 character character

All of the properties listed in this output can also be accessed, and in some cases modified, using functions. For example, similar to what we already saw for rasters in the previous chapter, the proj4stpring function returns the CRS definition of a vector layer in the PROJ.4 format. Using proj4string on airports returns the definition of the WGS84 CRS:

> proj4string(airports)
[1] "+proj=longlat +datum=WGS84"

Referring to the geometry part, the length function returns the number of features the layer consists of. For example, airports contains three points (the three airports), as the following output shows:

> length(airports)
[1] 3

A spatial layer also always has row names that internally serve as ID variables to match the geometries with attribute table entries. The number of row names is thus equal to the number of features:

> row.names(airports)
[1] "1" "2" "3"

The dimensions function returns the number of spatial dimensions:

> dimensions(airports)
[1] 2

Note

In this book, we only deal with two-dimensional vector layers (geometries on a plane). Three-dimensional layers can also be useful to represent certain types of data, such as points with (x,y) coordinates and (z) elevation.

Accessing the attribute table of vector layers

The attribute table of a vector layer is, in fact, a data.frame object and some of the functions that work with data.frame objects have been defined to consistently work directly on vector layers as well. For example, the nrow, ncol, and dim functions applied to a vector layer refer to its attribute table to return its dimensions:

> nrow(county)
[1] 3145
> ncol(county)
[1] 4
> dim(county)
[1] 3145    4

We see that the attribute table of county has 3,145 rows (thus, the layer has 3,145 features) and four columns. The columns contain the following information:

  • NAME_1: The first-level name (for example, the state name)
  • NAME_2: The second-level name (for example, the county name)
  • TYPE_2: The feature type (for example "County" or "Water body")
  • FIPS: The FIPS code

Individual columns of an attribute table, or subsets of these, can be accessed with the $ and [ operators. For example, the second-level names (held in the NAME_2 column) of the first 10 features in county can be obtained as follows:

> county$NAME_2[1:10]
 [1] "Litchfield" "Hartford"   "Tolland"    "Windham"   
 [5] "Siskiyou"   "Del Norte"  "Modoc"      "New London"
 [9] "Fairfield"  "Middlesex"  

As another example, we can check the types of features the county layer contains by listing the unique values in the TYPE_2 column:

> unique(county$TYPE_2)
 [1] "County"           "District"         "Borough"         
 [4] "Census Area"      "Municipality"     "City And Borough"
 [7] "City And County"  "Water body"       "Parish"          
[10] "Independent City"

The whole attribute table of a spatial vector layer can be accessed directly using the @ operator. The @ operator is used to extract a slot, by its name, from an object, using the notation object_name@slot_name.

Note

More specifically, the @ operator is applicable to objects of the so-called S4 classes, which all raster and vector layers we deal with are, as opposed to S3 classes whose components are accessed with a different method (using the $ operator). The distinction between S3 and S4 concerns the internal class structure and is beyond the scope of this book. For more information, refer to Advanced R, Wickham, H., CRC Press, 2014 (http://adv-r.had.co.nz/OO-essentials.html).

The attribute table slot of spatial vector classes defined in the sp package is called data. Therefore, adding @data after a vector layer name will yield its attribute table (if it has one).

For example, the following expression returns the attribute table of airports:

> airports@data
                       name
1 Albuquerque International
2           Double Eagle II
3        Santa Fe Municipal

As another example, we can print the first few rows in the attribute table of county using the head function applied to county@data:

> head(county@data)
       NAME_1     NAME_2 TYPE_2  FIPS
0 Connecticut Litchfield County 09005
1 Connecticut   Hartford County 09003
2 Connecticut    Tolland County 09013
3 Connecticut    Windham County 09015
4  California   Siskiyou County 06093
5  California  Del Norte County 06015

As we shall see later in this chapter, the attribute table of a vector layer can also be modified using assignment, similar to a separate data.frame object. New attribute table columns can be created and populated using the $ operator, or the whole attribute table can be modified (for example, certain columns can be deleted or joined) and reassigned to the data slot.

All other components of spatial vector (and raster, for that matter) objects are also contained in slots and thus, are accessible with the @ operator. Using the str function, we can obtain a tree describing the object's structure. Let's take a look at the following example:

> str(airports)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 3 obs. of  1 variable:
  .. ..$ name: chr [1:3] "Albuquerque International" "Double Eagl$
  ..@ coords.nrs : int [1:2] 1 2
  ..@ coords     : num [1:3, 1:2] -106.6 -106.8 -106.1 35 35.2 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "lon" "lat"
  ..@ bbox       : num [1:2, 1:2] -106.8 35 -106.1 35.6
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "lon" "lat"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84"

Note

Using such a tree, we can find our way to all the data components of an object. Then, why is the use of specialized functions (such as proj4string), rather than direct access to the relevant property (such as airports@proj4string@projargs), usually advocated in R? One reason is that working through functions makes our code more robust in the face of changes in class definition. In other words, if the internal architecture of a certain class changes in a future version of a given package (so that, for instance, the slot x is now named y), the user may not even notice since the code for all the relevant functions operating on the class will also be changed accordingly while access with @x will no longer work. Accessing the attribute table of a vector layer (with @data) is going to be the only direct access we have in this book. The exception is necessary since certain operations on an attribute table are unfeasible otherwise.

The attribute table of a vector layer can also be removed altogether, by converting a Spatial*DataFrame object into a Spatial* object. Such a conversion can be done with the as function, specifying the object name and the class we want to convert it to. For example, we can convert airports, a SpatialPointsDataFrame object, to a SpatialPoints object as follows:

> airports_sp = as(airports, "SpatialPoints")

Since a SpatialPoints object does not have a data slot, an error occurs when trying to access it:

> airports_sp@data
Error: no slot of name "data" for this object of class "SpatialPo$

We can also use the as function to perform the reverse conversion from a SpatialPoints object to a SpatialPointsDataFrame object. Naturally, the attribute table of the resulting object is going to be empty (since SpatialPoints objects do not have one):

> as(airports_sp, "SpatialPointsDataFrame")@data
data frame with 0 columns and 0 rows

Subsetting vector layers

We can subset a vector layer according to its attribute table using the same notation as in subsetting data.frame objects. Selecting which features to retain can be done by supplying a numeric or logical vector within the [ operator.

For example, to get a subset of only those county features that belong to the contiguous U.S., we need to exclude those features corresponding to the states of Alaska and Hawaii. This can be done by creating a logical vector (applying a condition to the county$NAME_1 column holding state names) and supplying that vector as the rows index of county with the [ operator, as follows:

> county = county[
+ county$NAME_1 != "Alaska" &
+ county$NAME_1 != "Hawaii", ]

Tip

Keep in mind the following alternative that utilizes the %in% operator:

> county = county[
+ !(county$NAME_1 %in% 
+ c("Alaska", "Hawaii")), ]

Similarly, we can retain only the land area by excluding water body polygons:

> county = county[county$TYPE_2 != "Water body", ]

Let's examine the resulting layer using the plot function. The expression plot(county) produces the graphical output as shown in the following screenshot:

Subsetting vector layers

As we can see, the plot function, by default, draws polygon borders using black lines. In subsequent examples, we will experiment a little bit with several parameters of this function to modify the appearance of the plot.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.175.253