Aggregating spatio-temporal raster data

Spatio-temporal data, such as MODIS images, time series, or meteorological records from several stations (see Chapter 3, Working with Tables) pose a challenge for analysis and visualization due to their three-dimensional nature. One approach to simplify such data is to perform aggregation in spatial and/or temporal dimensions (another approach to simplify spatio-temporal data is, for example, cluster analysis).

In this section, we will experiment with two approaches to aggregate the data held in the multiband raster r in order to get additional perspectives on the spatio-temporal behavior of NDVI within the geographic area this raster covers.

Note

More specialized classes and methods (including aggregation) for various types of spatio-temporal data are defined in the spacetime package. An overview of this package can be found in the introductory paper "spacetime: Spatio-Temporal Data in R" by its creator Pebesma E. 2012.

The time dimension

In our first example, we will aggregate r along the temporal dimension (that is, the layers). In fact, the previous example where we mapped the minimum and maximum values for each pixel in r using overlay with the range function (see Chapter 4, Working with Rasters, for more information) is also an example of aggregation. However, instead of aggregating all layers, what if you want to learn the properties of different time series portions? In such cases, you have to perform several overlay operations, and then combine the results into a multiband raster. That is exactly what we are going to do next.

Let's say we are interested in examining the seasonal NDVI averages across the studied area. In such cases, we need to perform an overlay of all layers for each season using the mean function, and then combine the results into a four-band raster (where each layer corresponds to one of the four seasons). Conceptually, the simplest way of doing this is with a for loop that goes through the four seasons, selecting the appropriate layers of r each time, applying overlay on those layers, and keeping the result to form a final multiband raster in the end.

As a preliminary step to construct the loop, we will define a character vector with season names (seasons) and an empty RasterStack object (season_means). The seasons vector will define loop iterations, while season_means will be used to hold the results:

> seasons = c("winter", "spring", "summer", "fall")
> season_means = stack()

Now, we are ready to execute the for loop:

> for(i in seasons) {
+ season_means = stack(season_means,
+ overlay(r[[which(dates$season == i)]],
+ fun = function(x) mean(x, na.rm = TRUE)))
+ }

The loop code section is executed four times (since the number of elements in seasons is four), each time assigning the next season name into i. The code adds the mean of the relevant layers in r (selected using the dates$season vector that we previously created; refer to Chapter 3, Working with Tables) to season_means. Therefore, in the first loop iteration, the mean of the "winter" layers in r is added to the empty season_means; in the second iteration, the mean of "spring" is added to season_means (which already has one layer), and so on until all four seasons are covered.

Now that the result season_means is ready, we can add season names to the respective layers for convenience:

> names(season_means) = seasons

We will view the result using the levelplot function:

> levelplot(season_means, par.settings=RdBuTheme, contour=TRUE)

The following screenshot shows the graphical output:

The time dimension

The preceding screenshot shows average NDVI for each season. We can see that NDVI is generally high during winter and spring and low during summer and fall. The image also shows that certain areas (such as the north-west corner of the image) have much higher NDVI in the wet season than in the dry one, while other areas (such as the three patches in the center of the image) maintain relatively high NDVI values in the summer months as well. The former are, in fact, areas where agricultural crops and natural herbaceous vegetation proliferate due to the relatively more abundant rainfall in the wet season, while the latter are planted evergreen pine forests where vegetation activity is maintained even in the dry season.

Another way of aggregating raster layers, bypassing the necessity to construct a loop, is with the stackApply function. Similar to the way in which tapply can be used to apply a function on different portions of a vector (as we discussed in Chapter 3, Working with Tables), the function stackApply has been defined to apply a function on different portions of layers in a multiband raster. Analogous to tapply, the main three parameters of stackApply are as follows:

  • The input multiband raster (x)
  • A vector of indices defining the layers grouping (indices)
  • The function to be applied on each group (fun)

Unlike tapply, the indices vector must be composed of consecutive integers, starting with 1. An additional parameter of stackApply is na.rm, which controls whether NA values are removed from calculations.

For example, we can calculate the monthly NDVI means as follows:

> month_means = stackApply(r,
+ indices = dates$month,
+ fun = mean,
+ na.rm = TRUE)

As in the previous example, we will name the layers of the result month_means:

> names(month_means) = month.abb

Note

The character vector month.abb is a predefined one in R; it holds the three-letter abbreviated month names: "Jan", "Feb", "Mar", and so on.

We can visualize the result with levelplot:

> levelplot(month_means, par.settings = RdBuTheme, contour = TRUE)

The following screenshot shows the graphical output:

The time dimension

The preceding screenshot shows NDVI dynamics with greater detail on a monthly scale rather than a seasonal scale. We can see that NDVI starts to increase around October-November and declines around February-March. During April-September (the dry season), NDVI is just about constant.

Note

Using the monthly averages, we can derive other informative products. For example, with the following overlay function call, we can create a raster that shows the month in which the lowest NDVI value is observed at each location:

> min_month = overlay(month_means, fun = which.min)

To save space, we will not plot the result here.

As mentioned earlier, the indices parameter in stackApply accepts a vector of integers starting with 1, which the dates$months vector conveniently was. However, how should we deal with other grouping vectors—character vectors or numeric vectors—that are not consecutive or that do not start with 1? We must first encode these as consecutive integer vectors, for example, by converting them to a factor (with the factor function) and then extracting the factor level indices (with the as.numeric function). Let's take a look at the following example:

> dates$year[1:30]
 [1] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000
[13] 2000 2000 2000 2000 2000 2000 2000 2000 2001 2001 2001 2001
[25] 2001 2001 2001 2001 2001 2001
> as.numeric(factor(dates$year))[1:30]
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

Here, we reclassified the dates$year vector (which contains 13 unique years) to a vector of integers from 1 to 13 (only the first 30 values were printed to save space).

Spatial dimensions

In our second example, we will aggregate r along the y axis spatial dimension. Doing this will allow us to observe the average NDVI at a given position on the north-south axis over time. In other words, we will reduce the number of dimensions from three to two by averaging all values of a given y axis/date position into a single value. The following schematic diagram describes our action plan:

Spatial dimensions

As we can see in the preceding diagram, our goal is to create a new raster s, where each column holds the vector of row averages from r. In other words, each s row is going to represent an r row (or y axis positions in space), while each s column is going to represent an r layer (or positions in time). We will perform the operation in two steps: first, creating an empty raster s (with the required dimensions) and second, populating it with values. We will also see two different ways of accomplishing the task: using a for loop and using apply.

As we can see in the preceding diagram, the number of rows in s should be equal to the number of rows in r, while the number of columns in s should be equal to the number of layers in r. In addition to setting row and column numbers, using parameters xmn, xmx, ymn, and ymx, we will set the minimum and maximum coordinates on the x and y axes (to avoid the default, where the extent is -180 to 180 on the x axis and -90 to 90 on the y axis). The coordinate ranges on the x and y axes are going to be 0 to 280 and 0 to 100, respectively; moreover, since the number of columns and rows is accordingly 280 and 100, the raster resolution will be 1 (an arbitrary unit) on both axes:

> s = raster(nrows = nrow(r), ncols = nlayers(r),
+ xmn = 0, xmx =  nlayers(r), ymn = 0, ymx = nrow(r))

Now that raster s is ready, we need to populate it with values. Following our first for loop method, we will define a new function called raster_rowMeans that, when provided with a raster (x) and the layer index (layer), returns a vector of row means in that layer. This function actually consists of a single expression as it simply applies the base function rowMeans on the matrix of values from a given layer, obtained with as.matrix:

> raster_rowMeans = function(x, layer) {
+ rowMeans(as.matrix(x[[layer]]),
+ na.rm = TRUE)
+ }

Now, all is left to be done is to go over the layers of r, each time calculating the row means in the current layer (with the raster_rowMeans function) and assigning the resulting vector to the appropriate column of s (exactly as depicted in the preceding diagram):

> for(i in 1:nlayers(r)) {
+ s[ ,i] = raster_rowMeans(r, i)
+ }

Now s is filled with the appropriate values, and we can use the levelplot function to display it. The additional at parameter of the levelplot function sets the breaks between color levels and contours. We use it to slightly reduce the quantity of contours and make the image clearer:

> levelplot(s,
+ par.settings = RdBuTheme,
+ contour = TRUE,
+ margin = FALSE,
+ at = seq(0,0.6,0.05))

The graphical output is shown in the following screenshot:

Spatial dimensions

The preceding screenshot shows the average NDVI at each point in time (x axis) and north-south position in space (y axis). Two main patterns are evident: first, along the x axis (time) we can see the periodical behavior of NDVI between the interchanging wet and dry seasons. However, there are also inter-annual differences—in certain years, NDVI is higher or lower than average—due to variation in climatic conditions and therefore in vegetation activity. Second, along the y axis (the north-south position), we can see the NDVI gradient from the relatively wet conditions, and thus a higher NDVI to the north, compared to drier conditions; consequently, we can also see a lower and more stable NDVI to the south.

The second method of obtaining the same result will be defined with less code, although it might be conceptually more difficult to grasp. The first step of the task, defining the empty raster s, is identical:

> s = raster(nrows = nrow(r), ncols = nlayers(r),
+ xmn = 0, xmx =  nlayers(r), ymn = 0, ymx = nrow(r))

However, the second step is different. Instead of going over the layers of r with a for loop, we will convert r to a three-dimensional array and utilize the apply function. The former is accomplished with the as.array function:

> r_array = as.array(r)

The result, r_array, is an array object where the first dimension corresponds to r rows, the second dimension to r columns, and the third dimension to r layers. We can therefore use apply to utilize rowMeans on the third dimension, obtaining the vector of row means for each third-dimension element (layer), which is exactly what we need. The values of the resulting matrix can be directly transferred to s:

> s[] = apply(r_array, 3, rowMeans, na.rm = TRUE)

The raster s we just created is identical to the one created (and plotted) earlier. Although the same results were produced, an important difference between the two methods is in the speed of execution. Accessing one raster layer at a time using a loop is slow compared to an apply operation on an array, which is generally very fast. For example, on the computer this book is written on, calculating the values of s using the first method is performed in about 1.8 minutes as compared to 2.6 seconds using the second method (that is ~40 times faster!). It is therefore often useful to transform a raster into a simpler object (matrix or array) before doing intensive calculations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.93.169