Chapter 4

Time as a Conditioning or Grouping Variable

In Section 3.1 we learned to display the time evolution of multiple time series with different scales. But, what if instead of displaying the time evolution we want to confront the variables between them? Section 4.1 proposes the scatterplot matrix solution with time as a grouping variable. Section 4.2 uses an enhanced scatterplot with time as a conditioning variable. Section 4.1.1 includes a digression about the hexagonal binning for large datasets.

4.1 Scatterplot Matrix: Time as a Grouping Variable

The scatterplot matrices are based on the technique of small multiples (Tufte 1990): small, thumbnail-sized representations of multiple images displayed all at once, which allows the reader to immediately, and in parallel, compare the inter-frame differences. A scatterplot matrix is a display of all pairwise bivariate scatterplots arranged in a p × p matrix for p variables. Each subplot shows the relation between the pair of variables at the intersection of the row and column indicated by the variable names in the diagonal panels (Friendly and Denis 2005).

This graphical tool is implemented in the splom function1. The following code displays the relation between the set of meteorological variables using a sequential palette from the ColorBrewer catalog (RbBu, with black added to complete a twelve-color palette) to encode the month. The order of colors of this palette is chosen in order to display summer months with intense colors and to distinguish between the first and second half of the year with red and blue, respectively (Figure 4.1).

Figure 4.1

Figure showing scatter plot matrix of the collection of meteorological time series of the aranjuez station.

Scatter plot matrix of the collection of meteorological time series of the Aranjuez station.

load(’data/aranjuez.RData’)

## Red-Blue palette with black added (12 colors)
colors <- c(brewer.pal(n=11, ’RdBu’), ’#000000’)
## Rearrange according to months (darkest for summer)
colors <- colors[c(6:1, 12:7)]

splom(~as.data.frame(aranjuez),
       groups=format(index(aranjuez), ’%m’),
     auto.key=list(space=’right’,
        title=’Month’, cex.title=1),
     pscale=0, varname.cex=0.7, xlab=’’,
       par.settings=custom.theme(symbol=colors,
          pch=19), cex=0.3, alpha=0.1)

Let’s explore Figure 4.1. For example,

  • The highest values of ambient temperature (average, maximum, and mimimun), solar radiation, and evotranspiration can be found during the summer.
  • These variables are almost linearly related. The relation between radiation and temperature is different during both halves of the year (red and blue regions can be easily distinguished).
  • The humidity reaches its highest values during winter without appreciable differences between the first and second half of the year. The temperature and humidity may be related with an exponential function.

A bit of interactivity can be added to this plot with the identification of some points. This task is easy with panel.link.splom. The points are selected via mouse clicks (and highlighted in green). Clicks other than left-clicks terminate the procedure. The output of this function is the index of chosen points.

trellis.focus(’panel’, 1, 1)
idx <- panel.link.splom(pch=13, cex=0.6, col=’green’)
aranjuez[idx,]

4.1.1 Hexagonal Binning

For large datasets, the display of a large number of points in a scatterplot produces hidden point density, long computation times, and slow displays. These problems can be circumvented with the estimation and representation of points densities. A common encoding uses gray scales, pseudo colors or partial transparency. An improved scheme encodes density as the size of hexagon symbols inscribed within hexagonal binning regions (D. B. Carr et al. 1987).

The hexbin package (D. Carr, Lewin-Koh, and Maechler 2013) includes several functions for hexagonal binning. The panel.hexbinplot is a good substitute for the default panel function. In addition, our first attempt with splom can be improved with several modifications (Figure 4.2):

  • The scale’s ticks and labels are suppressed with pscale=0.
  • The panels of the lower part of the matrix (lower.panel) will include a locally weighted scatterplot smoothing (loess) with panel.loess.
  • The diagonal panels (diag.panel) will display the kernel density estimate of each variable. The density function computes this estimate. The result is adjusted to the panel limits (calculated with current.panel.limits). The kernel density is plotted with panel.lines and the diag.panel.splom function completes the content of each diagonal panel.
  • The point density is encoded with the palette BTC (ligther colors for high density values and darker colors for almost empty regions, with a gradient of blue hues for intermediate values).

Figure 4.2

Figure showing scatterplot matrix of the collection of meteorological time series of the aranjuez station using hexagonal binning.

Scatterplot matrix of the collection of meteorological time series of the Aranjuez station using hexagonal binning.

library(hexbin)

splom(~as.data.frame(aranjuez),
         panel=panel.hexbinplot, xlab=’’,
         colramp=BTC,
         diag.panel = function(x, ...){
           yrng <- current.panel.limits()$ylim
           d <- density(x, na.rm=TRUE)
           d$y <- with(d, yrng[1] + 0.95 * diff(yrng) * y / max(y))
           panel.lines(d)
           diag.panel.splom(x, ...)
         },
         lower.panel = function(x, y, ...){
           panel.hexbinplot(x, y, ...)
           panel.loess(x, y, ..., col = ’red’)
         },
         pscale=0, varname.cex=0.7
         )

A drawback of the matrix of scatterplots with hexagonal binning is that each panel is drawn independently, so it is impossible to compute a common color key for all of them. In other words, two cells with exactly the same color in different panels encode different point densities.

It is possible to display a reduced set of variables against another one and generate a common color key using the hexbinplot function. First, the dataset must be reshaped from the wide format (one colum for each variable) to the long format (only one column for the values with one row for each observation).

The reshape function needs several arguments to perform the conversion. The most important is the data.frame to be transformed. Then there are the names of variables to be mapped to a single variable in the long dataset (the three ambient temperatures). The name of this variable can be set with v.names. Finally, timevar is the name of the column in long format that differentiates multiple observations from the same variable. The values of this column are defined with the times argument.

aranjuezDF <- data.frame(aranjuez,
                    month=format(index(aranjuez), ’%m’))
aranjuezRshp <- reshape(aranjuezDF, direction=’long’,
                   varying=list(names(aranjuez)[1:3]),
                   v.names=’Temperature’,
                   times=names(aranjuez)[1:3],
                   timevar=’Statistic’)
head(aranjuezRshp)

The hexbinplot displays this dataset with a different panel for each type of temperature (average, maximum, and minimum) but with a common color key encoding the point density (Figure 4.3). Now, two cells with the same color in different panels encode the same value.

Figure 4.3

Figure showing scatterplot with hexagonal binning of temperature versus solar radiation using data of the aranjuez station (lattice version).

Scatterplot with hexagonal binning of temperature versus solar radiation using data of the Aranjuez station (lattice version).

hexbinplot(Radiation~Temperature|Statistic, data=aranjuezRshp,
         layout=c(1, 3), colramp=BTC) +
   layer(panel.loess(..., col = ’red’))

The ggplot2 version uses stat_binhex.

ggplot(data=aranjuezRshp, aes(Temperature, Radiation)) +
   stat_binhex(ncol=1) +
   stat_smooth(se=FALSE, method=’loess’, col=’red’) +
   facet_wrap(~Statistic, ncol=1) +
   theme_bw()

4.2 Scatterplot with Time as a Conditioning Variable

After discussing the hexagonal binning, let’s recover the time variable. Figure 4.1 uses colors to encode months. Instead, we will now display separate scatterplots with a panel for each month. In addition, the statistic type (average, maximum, minimum) is included as an additional conditioning variable.

This matrix of panels can be displayed with ggplot using facet_grid. The code of Figure 4.4 uses partial transparency to cope with overplotting, small horizontal and vertical segments (geom_rug) to display points density on both variables, and a smooth line in each panel.

Figure 4.4

Figure showing scatterplot of temperature versus solar radiation for each month using data of the aranjuez station (ggplot2 version).

Scatterplot of temperature versus solar radiation for each month using data of the Aranjuez station (ggplot2 version).

ggplot(data=aranjuezRshp, aes(Radiation, Temperature)) +
   facet_grid(Statistic ~ month) +
   geom_point(col=’skyblue4’, pch=19, cex=0.5, alpha=0.3) +
   geom_rug() +
   stat_smooth(se=FALSE, method=’loess’, col=’indianred1’, lwd=1.2)
       +
   theme_bw()

The version with lattice needs the useOuterStrips function from the latticeExtra package, which prints the names of the conditioning variables on the top and left outer margins (Figure 4.5).

Figure 4.5

Figure showing scatterplot of temperature versus solar radiation for each month using data of the aranjuez station (lattice version).

Scatterplot of temperature versus solar radiation for each month using data of the Aranjuez station (lattice version).

useOuterStrips(xyplot(Temperature ~ Radiation | month * Statistic,
                 data=aranjuezRshp,
                 between=list(x=0),
                 col=’skyblue4’, pch=19,
                 cex=0.5, alpha=0.3)) +
   layer({
      panel.rug(..., col.line=’indianred1’, end=0.05, alpha=0.6)
      panel.loess(..., col=’indianred1’, lwd=1.5, alpha=1)
   })

These figures show the typical seasonal behavior of solar radiation and ambient temperature. Additionally, it displays in more detail the same relations between radiation and temperature already discussed with Figure 4.3.

1 ggplot2 users may wish to explore the ggpairs function from the GGally package.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.194.230