Subsetting rasters

In many situations, we would like to access the values of a given raster either to perform calculations involving these values (for example, to calculate a frequency table) or to make an assignment (for example, to change a certain value in the raster; see the previous code section in Chapter 1, The R Environment). In this section, we are going to cover the different ways to do this.

As an example, we are going to use another multiband raster, modis.tif. First, we will assign it to a RasterBrick object named r and print its properties, as follows:

> r = brick("C:\Data\modis.tif")
> r
class       : RasterBrick
dimensions  : 100, 100, 10000, 280  (nrow, ncol, ncell, nlayers)
resolution  : 500, 500  (x, y)
extent      : 660000, 710000, 3445000, 3495000  (xmin, xmax, ymin$
coord. ref. : +proj=utm +zone=36 +datum=WGS84 +units=m +no_defs +$
data source : C:Datamodis.tif
names       : modis.1, modis.2, modis.3, modis.4, modis.5, modis.$

The modis.tif file contains Normalized Difference Vegetation Index (NDVI) values from the MOD13A1 product of the Terra-MODIS satellite. As with Landsat, the original MOD13A1 data is available for free at http://earthexplorer.usgs.gov/. The modis.tif image covers an area of 2,500 km2 at 500 meters spatial resolution. Unlike with Landsat, the bands do not refer to different wavelengths of the satellite sensor, but rather to different dates of image acquisition. In other words, we have a time series of NDVI images. There are 280 bands, corresponding to the period between February 18, 2000 and April 6, 2012 (23 images per year, each corresponding to approximately a 16-day time interval). Pixels (raster cells) with unreliable data (due to clouds, for example) were assigned with NA as part of the preprocessing.

The NDVI is a commonly used remote sensing index, quantifying the abundance of green vegetation (it has a range of -1 to 1, with values closer to 1 corresponding to more abundant vegetation). The NDVI is calculated based on reflectance in the red and NIR bands. We are going to see exactly how it is done, using the Landsat image as an example, later on in this chapter.

To examine the geographical location of the modis.tif raster, we can use the plotKML package that has a suite of functions to export the spatial data from R in the KML or KMZ formats and automatically display it in Google Earth. The simplest possible example, using nothing but defaults, would be to call the plotKML function on one of the bands in the raster r (for example, on band 1) in order to open Google Earth and display it there. The expression plotKML(r[[1]]) will thus automatically open Google Earth, zoom in on the location of the raster r[[1]], and display its values using a color scale (assuming the R packages plotKML and animation are loaded and the Google Earth software is installed). The following screenshot demonstrates what we see as a result:

Subsetting rasters

Note

To subsequently display the result in Google Earth, the plotKML function has, in fact, written a KML file in the current working directory. The working directory is the default path that R uses to import and export files. It can be queried or modified using functions getwd and setwd, respectively. For example:

> getwd()
[1] "C:/Users/Michael Dorman/Documents"
> setwd("C:\Data")
> getwd()
[1] "C:/Data"

Utilizing working directories can save the time spent in writing absolute file paths, but it can also make the code less concrete, so we will not use it in the present book.

Interactive visualization, over informative reference layers (for example, in Google Earth), is very helpful for the initial examination of spatial data we have at hand. For example, we can now see clearly that the NDVI gradient within the raster, from relatively high values towards north-west direction to relatively low values towards south-east direction, is due to its positioning in the transition zone between the relatively humid Mediterranean climatic region (where vegetation is more abundant) and the arid Negev desert (where vegetation is scarce).

The interested reader can refer to the paper in the Journal of Statistical Software by Hengl, Roudier, Beaudette, and Pebesma (2014), or to the online tutorial (http://gsif.isric.org/doku.php?id=wiki:tutorial_plotkml) for further details and inspirational examples on the wide range of methods the plotKML package offers.

Accessing raster values as a vector

Returning to the subject of raster value access, the simplest way of doing that is with the [ operator, exactly the same way we would with a vector. When accessing the values of a raster with [, the values will be ordered from the top-left corner rightwards, then along the second row, and so on, until the lower-right corner is reached. For example, to find out the first five values of the first layer in the r raster, we will use the following expression:

> r[[1]][1:5]
[1] 0.4242 0.3995 0.4190 0.4272 0.4285

Note that indices referring to bands with [[ come first, and indices referring to cells with [ come second. Either can be omitted, and then all elements from the respective dimension will be returned rather than a subset (for example, as we have seen previously, r[[1]] returns the whole first band).

When the [ operator is used, but the cell value's index is omitted (as in []), we get a vector containing all of the raster values. We can use this vector, for example, to calculate the mean NDVI on the first date of acquisition (February 18, 2000).

> mean(r[[1]][], na.rm = TRUE)
[1] 0.2302056

The result is 0.23.

Accessing raster values with the matrix notation

Since a raster band is a two-dimensional object, it is frequently more useful to access its values using a two-dimensional notation. As with matrix objects, the first element of the two-dimensional index refers to rows and the second element refers to columns. For example, values 1-5 (in vector terms) of raster r[[1]] occupy row 1, columns 1-5. We can refer to these same values using a two-dimensional notation as follows:

> r[[1]][1, 1:5]
[1] 0.4242 0.3995 0.4190 0.4272 0.4285

Note that, even though we are using a two-dimensional notation to subset the raster, the values are still returned in the form of a one-dimensional numeric vector.

Subsets involving more than one layer

In the last two sections, we accessed a subset of raster values confined to a single layer. What happens when we subset both the row/column and layer dimensions of a raster? In that case, we get a matrix object, rather than a vector, with columns referring to layers and rows referring to raster cells. For example, using the following expression, we are referring to the values occupying row 1, columns 1-5, and layers 1-3:

 > r[[1:3]][1, 1:5]
     modis.1 modis.2 modis.3
[1,]  0.4242  0.4518  0.4211
[2,]  0.3995  0.3334  0.4123
[3,]  0.4190  0.3430  0.4314
[4,]  0.4272  0.3430  0.4761
[5,]  0.4285  0.5814  0.4761

As a result, we get a matrix with three columns (corresponding to layers 1-3) and five rows (corresponding to the requested five cells, ordered from the top-left corner rightwards). Indeed, the values in the first column of the matrix are identical to the values of the vector we got in the previous two examples.

As another example, we can examine the course of NDVI over time at a single pixel—for example, in row 45, column 33—by omitting the band index this time (and thus referring to all bands at once):

> v = r[45, 33][1, ]

Note that with r[45,33], we get a matrix object with 280 columns (since we access all 280 bands of raster r) and a single row (since we access a single cell). Then, with the [1,] part, we select the first (and only) row in that matrix, containing the values of the (45,33) cell across all bands. As we witnessed earlier, a single matrix row is by default simplified to a vector. Finally, we assign the vector of NDVI values to v.

To plot the resulting NDVI time series, now held in v, we will use the date column in the dates table (see the previous chapter), which lists the dates of acquisition for each band in r. We will also specify the labels for the x and y axes, using parameters xlab and ylab of the plot function, respectively:

> plot(v ~ dates$date, type = "l", xlab = "Time", ylab = "NDVI")

The resulting graphical output is shown in the following screenshot:

Subsets involving more than one layer

We can clearly see the periodical behavior of NDVI at the annual scale; NDVI increases in winter (the wet season), when vegetation is more abundant, and declines in summer (the dry season), when vegetation desiccates. Lower-than-usual NDVI values have been observed from 2009 to 2011 due to a drought period the region experienced at the time.

In all subset methods we have seen in the last three sections, the result was automatically converted to a simpler object, either to a vector (when dealing with values from a single band) or a matrix (when dealing with values form several bands). If we want to suppress the simplification, we can specify drop=FALSE, the same way we have seen regarding subsets of a data.frame object (see the previous chapter) and a matrix object. In the following example, using drop=FALSE yields a subset RasterBrick object named u that has the first two rows, two columns, and three layers of the original raster r:

> u = r[[1:3]][1:2, 1:2, drop = FALSE]

Plotting the object will demonstrate that u is indeed a 2 x 2 raster with three layers. The following expression plots the raster u:

> levelplot(u, layout = c(3,1), par.settings = RdBuTheme)

Using the parameter layout, we specified that the bands should be arranged in a single row and three columns within the plot area. The following screenshot shows what the plot will look like:

Subsets involving more than one layer

Transforming a raster into a matrix or an array

At times, it can be useful to transform a raster into a simpler data structure, such as a matrix or an array. One of the reasons to do that is to perform faster calculations (see Chapter 6, Modifying Rasters and Analyzing Raster Time Series). The transformations can be achieved using functions such as as.matrix and as.array, respectively.

For example, a single layer of a raster can be transformed into a matrix as follows:

> as.matrix(u[[1]])
       [,1]   [,2]
[1,] 0.4242 0.3995
[2,] 0.4495 0.2925

A multiband raster can be transformed into an array as follows:

> as.array(u[[1:2]])
, , 1

       [,1]   [,2]
[1,] 0.4242 0.3995
[2,] 0.4495 0.2925
, , 2

       [,1]   [,2]
[1,] 0.4518 0.3334
[2,] 0.4846 0.3223

If we try to convert a multiband raster into a matrix with as.matrix, we will get a matrix with rows representing cells and columns representing layers, as we have seen earlier in the context of raster subsetting:

> as.matrix(u[[1:2]])
     layer.1 layer.2
[1,]  0.4242  0.4518
[2,]  0.3995  0.3334
[3,]  0.4495  0.4846
[4,]  0.2925  0.3223
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.66.185