Data structures for rasters in the raster package

A raster is a rectangular grid of numeric values, referenced to a certain geographical extent. As previously mentioned, spatial referencing is what differentiates a raster from the simpler data structures (matrices and arrays) we have seen previously. A raster can have a single value in each cell (a single band, or single layer, raster—analogous to a matrix) or several values (a multiband, or multilayer, raster—analogous to an array). Rasters conceptually differ from vector layers, which are data structures to represent non-gridded objects such as spatial points, lines, and polygons (these will be covered in the next chapter).

In this book, we are going to work with classes to represent rasters from the raster package. This package does not come with the R installation, so we first have to install it using install.packages (see the previous chapter). We will also need to install the rgdal package since functions in the raster package use functions defined in rgdal for certain tasks, such as input/output operations. Taking a look at the official overview of R packages for spatial data analysis (http://cran.r-project.org/web/views/Spatial.html) is highly recommended at this stage. This web page is useful to find out how the previously mentioned packages raster and rgdal (and the ones to be introduced in the upcoming chapters) fit within the broader ecosystem of spatial data analysis tools available in R.

Note

The rgdal package, which stands for Geospatial Data Abstraction Library (GDAL) extensions to R, is a very important one to work with spatial data, and we will cover it in several contexts. The name GDAL may be familiar to some readers; GDAL is a C library frequently used in other software (such as QGIS) and programming languages (such as Python). In fact, there are four C libraries providing the core functionality to work with spatial data in R interfaced through R functions. They are GDAL, OGR, PROJ.4 (which are available using functions in the rgdal package), and GEOS (which is available through functions in the rgeos package).

The remaining part of this chapter is going to introduce the basic usage of the raster package with two real-world examples of remote sensing data. More advanced functionality of this package, as well as examples with another common type of raster data, Digital Elevation Model (DEM), will be introduced in subsequent chapters. We'll be creating a third type of raster—predicted surfaces from spatial interpolation—in Chapter 8, Spatial Interpolation of Point Data.

Note

Similarly to the GIS software, the raster package has the capability of working with big rasters that cannot be accommodated in the RAM (in such cases, for example, the data are automatically processed in chunks and the results are written to temporary files on disk).

A comprehensive overview of the range of capabilities the raster package offers can also be found in its accompanying introductory tutorial (http://cran.r-project.org/web/packages/raster/vignettes/Raster.pdf).

Creating single band rasters

There are three classes to represent spatial rasters in the raster packages. These are RasterLayer, RasterStack, and RasterBrick. The first class is used to represent single band rasters (see the following examples), whereas the last two classes are used to represent multiband rasters (see the next section).

The RasterLayer class represents a single band raster. A new RasterLayer object can be created using the raster function in several ways. For example, a matrix object can be converted to a RasterLayer object as follows:

> library(raster)
> r1 = raster(x)
> r1
class       : RasterLayer
dimensions  : 2, 3, 6  (nrow, ncol, ncell)
resolution  : 0.3333333, 0.5  (x, y)
extent      : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : in memory
names       : layer
values      : 7, 12  (min, max)

We see that the print method for RasterLayer objects does something different from what we have seen so far. Rather than printing all values the object is composed of, a summary of certain properties of the particular RasterLayer object is given. We will see how to directly access some of these properties later. For now, it is worth repeating that a RasterLayer object (as opposed to a matrix) has spatial reference information, that is, a certain resolution, extent, and Coordinate Reference System (CRS). Naturally, the particular raster r, which we just created from a plain numeric matrix, has no CRS, and its resolution and extent have been automatically generated by the raster function (the extent is between 0 and 1 on both the x and y axes; the resolution is calculated accordingly).

A more common way to create a raster object in R is to read the raster data from a file. For example, given that the raster and rgdal packages are installed on our system and the raster file landsat_15_10_1998.tif exists in the C:Data directory, the following expression will read the contents of its first band and assign it to an object named band1 of class RasterLayer:

> band1 = raster("C:\Data\landsat_15_10_1998.tif")

Reading files from disk, as mentioned earlier, is done through the rgdal package (which is automatically loaded, if it was not already, when trying to read a file using the raster function). At present, there are ~100 supported input formats (you can get a list of these by typing getGDALDriverNames()$name once rgdal is loaded). These include, for example, the frequently used GeoTIFF (*.tif or *.tiff), which we will use in the examples in this book, and ERDAS IMAGINE image (*.img) formats.

Printing the properties of raster band1 and comparing them to those of r1 from the previous example will demonstrate that this time we do have meaningful spatial reference information in the RasterLayer object band1, as shown in the following example:

> band1
class       : RasterLayer
band        : 1  (of  6  bands)
dimensions  : 960, 791, 759360  (nrow, ncol, ncell)
resolution  : 30, 30  (x, y)
extent      : 663945, 687675, 3459375, 3488175  (xmin, xmax, ymin$
coord. ref. : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs
data source : C:Datalandsat_15_10_1998.tif
names       : landsat_15_10_1998
values      : 0.01737053, 0.5723241  (min, max)

Spatial reference information is stored in the GeoTIFF file and incorporated in the RasterLayer object when it is created. We can see that raster band1 has a projected CRS, specifically the UTM Zone 36N coordinate system. Thus, its resolution, 30 x 30, is in meters. We can also see that it is one of the six bands the landsat_15_10_1998.tif file contains.

The input file landsat_15_10_1998.tif is, in fact, a subset of a Landsat satellite image of central Israel, where the original values were converted to reflectance (the fraction of incident electromagnetic radiation that is reflected from the surface, for a given wavelength). The original image, taken on October 15, 1998, is available for free at http://earthexplorer.usgs.gov/. The landsat_15_10_1998.tif file has six bands (Landsat bands 1-5 and 7) and covers an area of ~24 x ~29 kilometers (out of the 170 x 183 kilometers covered by the original image). The first four bands correspond to blue, green, red, and Near Infrared (NIR), while the last two belong to the Short Wave Infrared (SWIR) portion of the electromagnetic spectrum. Two additional Landsat images of the same area, taken about 2 and 5 years after 1998, are also available as sample datasets along with this book (the landsat_04_10_2000.tif and landsat_11_09_2003.tif files).

Since the raster function reads, by default, the first band of a multiband raster file, object band1 that we just created contains reflectance data from the blue band. We can point to a different band with the band parameter of the raster function. For example, we can create another RasterLayer object, named band4, that will hold the NIR data as follows:

> band4 = raster("C:\Data\landsat_15_10_1998.tif", band = 4)

Creating multiband rasters

Two classes to represent multiband rasters are defined in the raster package: RasterStack and RasterBrick. The only difference between these classes is in the flexibility of data sources. While a RasterBrick object must refer to a single file (either in the RAM or on disk), each layer in a RasterStack object can come from a different file (or a layer in a multiband file). The advantage of RasterBrick is in the potentially faster processing time.

A RasterStack object can be created using the stack function, for example, by combining several RasterLayer objects as follows:

> stack(band1, band4)
class       : RasterStack
dimensions  : 960, 791, 759360, 2  (nrow, ncol, ncell, nlayers)
resolution  : 30, 30  (x, y)
extent      : 663945, 687675, 3459375, 3488175  (xmin, xmax, ymin$
coord. ref. : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs
names       : landsat_15_10_1998.1, landsat_15_10_1998.2
min values  :           0.01737053,           0.04885371
max values  :            0.5723241,            0.7096972

Here, we combined the two RasterLayer objects band1 and band4 into a single RasterStack object. We can see that a RasterStack object has an additional dimension: the bands or layers (in this case, there are two). A RasterBrick object can be created using the brick function in exactly the same way.

We can also use the stack or brick function to read a multiband raster file into a RasterStack or RasterBrick object. Let's read the Landsat image from 2000 into a RasterBrick object named l_00:

> l_00 = brick("C:\Data\landsat_04_10_2000.tif")
> l_00
class       : RasterBrick 
dimensions  : 960, 791, 759360, 6  (nrow, ncol, ncell, nlayers)
resolution  : 30, 30  (x, y)
extent      : 663945, 687675, 3459375, 3488175  (xmin, xmax, ymin$
coord. ref. : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs 
data source : C:Datalandsat_04_10_2000.tif 
names       : landsat_04_10_2000.1, landsat_04_10_2000.2, landsat$
min values  :         3.109737e-05,         2.019792e-02,        $
max values  :            0.6080654,            0.6905138,        $

This time, our RasterBrick object l_00 holds all six bands the landsat_04_10_2000.tif file contains.

The following table summarizes the previously mentioned properties of the three classes:

Class

Function

Bands

Storage

RasterLayer

raster

1

Disk/RAM

RasterStack

stack

Greater than 1

Disk/RAM

RasterBrick

brick

Greater than 1

Disk/RAM, single file

Once we have a multiband raster object (such as RasterStack), we can access individual bands using the double square brackets [[ operator. By supplying a numeric vector of band indices within double brackets, we can get a subset of bands from the multiband raster. When the index has a length of 1, we get a RasterLayer object holding a single band. For example, using the expression l_00[[2]], we get a RasterLayer object that holds the second band as follows:

> class(l_00[[2]])
[1] "RasterLayer"
attr(,"package")
[1] "raster"

When the length of the index is greater than 1, we get a multiband object that holds the specific bands we selected. For example, using the expression l_00[[1:3]], we get a RasterStack object containing only bands 1-3:

> class(l_00[[1:3]])
[1] "RasterStack"
attr(,"package")
[1] "raster"

Writing raster files

Raster objects can be written to disk with the writeRaster function. Writing in nine formats is currently supported. For example, to write our recently created RasterStack object back to disk, in a different format (say, an ERDAS IMAGINE image, *.img), we will run the following expression:

> writeRaster(l_00,
+ "C:\Data\landsat_04_10_2000.img",
+ format = "HFA",
+ overwrite = FALSE)

Note that we specified the values of four parameters:

  • The object to be written (l_00)
  • The path and name for the file to be written ("C:\Data\landsat_04_10_2000.img")
  • The format of choice (see ?writeRaster for the list of abbreviations) (format="HFA")
  • Whether to overwrite when the file already exists (overwrite=FALSE)

Exploring a raster's properties

In this section, we are going to review some of the functions used to query the properties of raster objects, and modify those properties when appropriate. Accessing and modifying the raster values (these can also be viewed as a property the raster has) is going to be covered in the next sections.

The number of rows, columns, and layers of a raster can be obtained using functions nrow, ncol, and nlayers, respectively:

> nrow(l_00)
[1] 960
> ncol(l_00)
[1] 791
> nlayers(l_00)
[1] 6

As we have seen previously in other contexts, the dim function returns the lengths of all dimensions at once as follows:

> dim(l_00)
[1] 960 791   6

The number of cells (equal to the number of rows multiplied by the number of columns) can be obtained using the ncell function:

> ncell(l_00)
[1] 759360

As for the spatial reference properties, the res and extent functions return the resolution and extent of the raster, while the proj4string function returns the CRS information. Let's see how these functions work, one function at a time:

> res(l_00)
[1] 30 30

The output of res is a vector of length 2, and its values denote the resolutions on the x and y axes, respectively (these are usually equal). Here is an example of querying the raster's extent:

> extent(l_00)
class       : Extent 
xmin        : 663945 
xmax        : 687675 
ymin        : 3459375 
ymax        : 3488175

The returned object from the extent function is an object of the Extent class. Objects of this class define a rectangular bounding box and have several uses, such as cropping a raster according to the extent of another raster (using the crop function, as we shall see in upcoming chapters).

The returned object from the proj4string function is a character vector (of length 1), holding the CRS information in the PROJ.4 format:

> proj4string(l_00)
[1] "+proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs"

Certain methods (such as reprojection of vector layers, which will be introduced in the next chapter), require CRS information as an object of class CRS rather than a character value. A CRS object can be created in a straightforward manner, namely applying function CRS to a PROJ.4 character string. The CRS function is defined in the sp package—another very important package to work with spatial data in R (it is automatically loaded along with the raster package) and one that is going to be covered in the next chapter.

The CRS object contains exactly the same information, only in a different form:

> CRS(proj4string(l_00))
CRS arguments:
 +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs

One of the advantages of using the CRS class is that the correspondence of a specific character string to a valid CRS is ensured (otherwise the CRS function will trigger an error).

Sometimes, we would like to modify the CRS information of a spatial object (or assign one if it is missing). For example, assignment of NA to the CRS component is equivalent to clearing the CRS information:

> proj4string(l_00) = NA
> proj4string(l_00)
[1] NA

When a raster does not have a CRS specified, we can assign it one. One way to do this is by using the appropriate PROJ.4 character string (which, in turn, can be obtained from another resource, such as http://www.spatialreference.org/). Here is an example of how this can be done:

> proj4string(l_00) =
+ CRS("+proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs")
> proj4string(l_00)
[1] "+proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs"

It is frequently more convenient to transfer CRS information from another spatial object (which is analogous to importing a CRS from another layer, a common procedure in a GIS software), rather than looking up its specific parameters. For example, we can assign our raster object l_00 the CRS data from another Landsat satellite image we read from the disk:

> l1 = raster("C:\Data\landsat_15_10_1998.tif")
> proj4string(l_00) = CRS(proj4string(l1))

A graphical display is often the most helpful way to perceive the properties of a given raster. For example, the two basic functions plot and hist can give a first impression of the raster values' distribution. The plot function, when applied to a raster object, generates a simple map of the values in each band. For more advanced visualization of this sort, we are going to use the levelplot function (in the following example) and the ggplot2 package (in Chapter 9, Advanced Visualization of Spatial Data). The hist function displays a histogram of the values in each band of the raster.

Prior to plotting, we will modify another property the raster l_00 has—its band names—using the names function, so that more appropriate names will appear along with each respective image in the graphical output. The automatically generated names are often inconvenient; for example, they may be composed of the filename with sequential numbers for the different bands:

> names(l_00)
[1] "landsat_04_10_2000.1" "landsat_04_10_2000.2"
[3] "landsat_04_10_2000.3" "landsat_04_10_2000.4"
[5] "landsat_04_10_2000.5" "landsat_04_10_2000.6"

We can assign shorter names as follows:

> names(l_00) = paste("Band", 1:6, sep = "_")
> names(l_00)
[1] "Band_1" "Band_2" "Band_3" "Band_4" "Band_5" "Band_6"

Now, using the expression hist(l_00), we will generate histograms of values in each band of raster l_00, which are shown in the following screenshot:

Exploring a raster's properties

Expanded functionality in the visualization of raster data in R is available through several contributed packages. For example, the levelplot function from the rasterVis package (which is a modified version of the levelplot function from the lattice package) by default displays all bands of a given raster using a single color scale (unlike plot), which is something we usually want to do. Note that the levelplot function has numerous additional parameters to modify the plot appearance, and the rasterVis package contains several other useful functions to visualize rasters, that we are not going to cover (instead, you will learn how to produce customized graphical output using the ggplot2 package in Chapter 9, Advanced Visualization of Spatial Data). The interested reader is referred to the tutorial of the rasterVis package (http://oscarperpinan.github.io/rastervis/) and the related book by the package author Oscar Perpinan Lamigueiro, Displaying Time Series, Spatial, and Space-Time Data with R, CRC Press (2014).

Two very useful parameters of levelplot are par.settings, which determines the color scale (for example, the blue-red scale is available using RdBuTheme), and contour, which determines whether to display contours. Let's take a look at the following example:

> library(rasterVis)
> levelplot(l_00, par.settings = RdBuTheme, contour = FALSE)

The following graphical output is generated:

Exploring a raster's properties

The previous screenshot shows reflectance values between 0 (completely dark) to 1 (completely reflective) for each Landsat band. To produce a so-called true color image, we would have to combine bands 1-3 (blue, green, and red), as will be shown in the next chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.176.99