Chapter 5

Time as a Complementary Variable

Gapminder1 is an independent foundation based in Stockholm, Sweden. Its mission is “to debunk devastating myths about the world by offering free access to a fact-based world view.” They provide free online tools, data, and videos “to better understand the changing world.” The initial development of Gapminder was the Trendalyzer software, used by Hans Rosling in several sequences of his documentary “The Joy of Stats.”

The information visualization technique used by Trendalyzer is an interactive bubble chart. By default it shows five variables: two numeric variables on the vertical and horizontal axes, bubble size and color, and a time variable that may be manipulated with a slider. The software uses brushing and linking techniques for displaying the numeric value of a highlighted country.

This software was acquired by Google® in 2007, and is now available as a Motion Chart gadget and as the Public Data Explorer.

In this chapter, time will be used as a complementary variable which adds information to a graph where several variables are confronted. We will illustrate this approach with the evolution of the relationship between Gross National Income (GNI) and carbon dioxide (CO2) emissions for a set of countries extracted from the database of the World Bank Open Data. We will try several solutions to display the relationship between CO2 emissions and GNI over the years using time as a complementary variable. The final method will produce an animated plot resembling the Trendalyzer solution.

5.1 Polylines

The first solution is a Motion Chart the googleVis package (Gesmann and Castillo 2011), an interface between R and the Google Visualisation API. With its gvisMotionChart function it is easy to produce a Motion Chart that can be displayed using a browser with Flash enabled (Figure 5.1).

Figure 5.1

Figure showing snapshot of a motion chart produced with googlevis.

Snapshot of a Motion Chart produced with googleVis.

load(’data/CO2.RData’)
library(googleVis)
pgvis <- gvisMotionChart(CO2data, idvar=’Country.Name’, timevar=’
    Year’)

Although the gvisMotionChart is quite easy to use, the global appearance and behavior are completely determined by Google API2. Moreover, you should carefully read their Terms of Use before using it for public distribution.

Our next attempt is to display the entire data in a panel with a scatterplot using country names as the grouping factor. Points of each country are connected with polylines to reveal the time evolution (Figure 5.2).

Figure 5.2

Figure showing GNI per capita versus CO2 emissions per capita (lattice version).

GNI per capita versus CO2 emissions per capita (lattice version).

## lattice version
xyplot(GNI.capita ~ CO2.capita, data=CO2data,
      xlab=“Carbondioxideemissions(metrictonspercapita)”,
      ylab=“GNIpercapita,PPP(currentinternational$)”,
      groups=Country.Name, type=’b’)
## ggplot2 version
ggplot(data=CO2data, aes(x=CO2.capita, y=GNI.capita,
         color=Country.Name)) +
   xlab(“Carbondioxideemissions(metrictonspercapita)”) +
   ylab(“GNIpercapita,PPP(currentinternational$)”) +
   geom_point() + geom_path() + theme_bw()

Three improvements can be added to this graphical result:

  1. Define a better palette to enhance visual discrimination between countries.
  2. Display time information with labels to show year values.
  3. Label each polyline with the country name instead of a legend.

5.2 Choosing Colors

The Country.Name categorical variable will be encoded with a qualitative palette, namely the first five colors of Set1 palette3 from the RColorBrewer package (Neuwirth 2011). Because there are more countries than colors, we have to repeat some colors to complete the number of levels of the variable Country.Name. The result is a palette with non-unique colors, and thus some countries will share the same color. This is not a problem because the curves will be labeled, and countries with the same color will be displayed at enough distance.

library(RColorBrewer)

nCountries <- nlevels(CO2data$Country.Name)
pal <- brewer.pal(n=5, ’Set1’)
pal <- rep(pal, length = nCountries)

Adjacent colors of this palette are chosen to be easily distinguishable. Therefore, the connection between colors and countries must be in such a way that nearby lines are encoded with adjacent colors of the palette.

A simple approach is to calculate the annual average of the variable to be represented along the x-axis (CO2.capita), and extract colors from the palette according to the order of this value.

## Rank of average values of CO2 per capita
CO2mean <- aggregate(CO2.capita ~ Country.Name, data=CO2data, FUN=
    mean)
palOrdered <- pal[rank(CO2mean$CO2.capita)]

A more sophisticated solution is to use the ordered results of a hierarchical clustering of the time evolution of the CO2 per capita values (Figure 5.3). The data is extracted from the original CO2 data.frame.

Figure 5.3

Figure showing hierarchical clustering of the time evolution of co2 per capita values.

Hierarchical clustering of the time evolution of CO2 per capita values.

CO2capita <- CO2data[, c(’Country.Name’, ’Year’, ’CO2.capita’)]
CO2capita <- reshape(CO2capita, idvar=’Country.Name’, timevar=’Year’
    , direction=’wide’)
hCO2 <- hclust(dist(CO2capita[, -1]))

oldpar <- par(mar=c(0, 2, 0, 0) + .1)
plot(hCO2, labels=CO2capita$Country.Name,
    xlab=’’, ylab=’’, sub=’’, main=’’)
par(oldpar)

The colors of the palette are assigned to each country with match, which returns a vector of the positions of the matches of the country names in alphabetical order in the country names ordered according to the hierarchical clustering.

idx <- match(levels(CO2data$Country.Name),
          CO2capita$Country.Name[hCO2$order])
palOrdered <- pal[idx]

It must be highlighted that this palette links colors with the levels of Country.Name (country names in alphabetical order), which is exactly what the groups argument provides. The following code produces a curve for each country using different colors to distinguish them.

## simpleTheme encapsulates the palette in a new theme for xyplot
myTheme <- simpleTheme(pch=19, cex=0.6, col=palOrdered)

pCO2.capita <- xyplot(GNI.capita ~ CO2.capita,
                  xlab=“Carbondioxideemissions(metrictonsper
                      capita)”,
                  ylab=“GNIpercapita,PPP(currentinternational$)
                      ”,
                  groups=Country.Name, data=CO2data,
                  par.settings=myTheme,
                  type=’b’)
gCO2.capita <- ggplot(data=CO2data, aes(x=CO2.capita, y=GNI.capita,
                  color=Country.Name)) +
   geom_point() + geom_path() +
   scale_color_manual(values=palOrdered, guide=FALSE) +
   xlab(’CO2emissions(metrictonspercapita)’) +
   ylab(’GNIpercapita,PPP(currentinternational$)’) +
   theme_bw()

5.3 Labels to Show Time Information

This result can be improved with labels displaying the years to show the time evolution. A panel function with panel.text to print the year labels and panel.superpose to display the lines for each group is a solution. In the panel function, subscripts is a vector with the integer indices representing the rows of the data.frame to be displayed in the panel.

xyplot(GNI.capita ~ CO2.capita,
      xlab=“Carbondioxideemissions(metrictonspercapita)”,
      ylab=“GNIpercapita,PPP(currentinternational$)”,
      groups=Country.Name, data=CO2data,
      par.settings=myTheme,
      type=’b’,
      panel=function(x, y, ..., subscripts, groups){
        panel.text(x, y, ...,
                 labels=CO2data$Year[subscripts],
                 pos=2, cex=0.5, col=’gray’)
        panel.superpose(x, y, subscripts, groups,...)
      }
      )

The same result with a clearer code is obtained with the combination of +.trellis, glayer_ and panel.text. Using glayer_ instead of glayer, we ensure that the labels are printed below the lines.

pCO2.capita <- pCO2.capita +
   glayer_(panel.text(..., labels=CO2data$Year[subscripts],
                  pos=2, cex=0.5, col=’gray’))
gCO2.capita <- gCO2.capita + geom_text(aes(label=Year),
                               colour=’gray’,
                               size=2.5,
                               hjust=0, vjust=0)

5.4 Country Names: Positioning Labels

The common solution to link each curve with the group value is to add a legend. However, a legend can be confusing with too many items. In addition, the reader must carry out a complex task: Choose the line, memorize its color, search for it in the legend, and read the country name.

A better approach is to label each line using nearby text with the same color encoding. A suitable method is to place the labels close to the end of each line (Figure 5.4). Labels are placed with the panel.pointLabel function from the maptools package. This function use optimization routines to find locations without overlaps.

Figure 5.4

Figure showing CO2 emissions versus GNI per capita. labels are placed with panel.pointlabel.

CO2 emissions versus GNI per capita. Labels are placed with panel.pointLabel.

library(maptools)
## group.value provides the country name; group.number is the
## index of each country to choose the color from the palette.
pCO2.capita +
   glayer(panel.pointLabel(mean(x), mean(y),
                      labels= group.value,
                      col=palOrdered[group.number],
                      cex=.8,
                      fontface=2, fontfamily=’Palatino’))

However, this solution does not solve the overlapping between labels and lines. The package directlabels (Hocking 2013) includes a wide repertory of positioning methods to cope with this problem. The main function, direct.label, is able to determine a suitable method for each plot, although the user can choose a different method from the collection or even define a custom method. For the pCO2.capita object, I have obtained the best results with extreme.grid (Figure 5.5).

Figure 5.5

Figure showing CO2 emissions versus gni per capita. labels are placed with the extreme.grid method of the directlabels package.

CO2 emissions versus GNI per capita. Labels are placed with the extreme.grid method of the directlabels package.

library(directlabels)
direct.label(pCO2.capita, method=’extreme.grid’)
direct.label(gCO2.capita, method=’extreme.grid’)

5.5 A Panel for Each Year

Time can be used as a conditioning variable (as shown in previous sections) to display subsets of the data in different panels. Figure 5.6 is produced with the same code as in Figure 5.2, now including |factor(Year) in the lattice version and facet_wrap(~ Year) in the ggplot2 version.

Figure 5.6

Figure showing CO2 emissions versus GNI per capita with a panel for each year.

CO2 emissions versus GNI per capita with a panel for each year.

xyplot(GNI.capita ~ CO2.capita | factor(Year), data=CO2data,
      xlab=“Carbondioxideemissions(metrictonspercapita)”,
      ylab=“GNIpercapita,PPP(currentinternational$)”,
      groups=Country.Name, type=’b’,
      auto.key=list(space=’right’))
ggplot(data=CO2data, aes(x=CO2.capita, y=GNI.capita, colour=Country.
    Name)) +
   facet_wrap(~ Year) + geom_point(pch=19) +
   xlab(’CO2emissions(metrictonspercapita)’) +
   ylab(’GNIpercapita,PPP(currentinternational$)’) +
   theme_bw()

Because the grouping variable, Country.Name, has many levels, the legend is not very useful. Once again, point labeling is recommended (Figure 5.7).

Figure 5.7

Figure showing CO2 emissions versus GNI per capita with a panel for each year.

CO2 emissions versus GNI per capita with a panel for each year.

xyplot(GNI.capita ~ CO2.capita | factor(Year), data=CO2data,
      xlab=“Carbondioxideemissions(metrictonspercapita)”,
      ylab=“GNIpercapita,PPP(currentinternational$)”,
      groups=Country.Name, type=’b’,
      par.settings=myTheme) +
   glayer(panel.pointLabel(x, y, labels=group.value,
                      col=palOrdered[group.number], cex=0.7))

5.5.1 imageUsing Variable Size to Encode an Additional Variable

Instead of using simple points, we can display circles of different radius to encode a new variable. This new variable is CO2.PPP, the ratio of CO2 emissions to the Gross Domestic Product with purchasing power parity (PPP) estimations.

To use this numeric variable as an additional grouping factor, its range must be divided into different classes. The typical solution is to use cut to coerce the numeric variable into a factor whose levels correspond to uniform intervals, which could be unrelated to the data distribution. The classInt package (R. Bivand 2013) provides several methods to partition data into classes based on natural groups in the data distribution.

library(classInt)
z <- CO2data$CO2.PPP
intervals <- classIntervals(z, n=4, style=’fisher’)

Although the functions of this package are mainly intended to create color palettes for maps, the results can also be associated to point sizes. cex.key defines the sequence of sizes (to be displayed in the legend) associated with each CO2.PPP using the findCols function.

nInt <- length(intervals$brks) - 1
cex.key <- seq(0.5, 1.8, length=nInt)

idx <- findCols(intervals)
CO2data$cexPoints <- cex.key[idx]

The graphic will display information on two variables (GNI.capita and CO2.capita in the vertical and horizontal axes, respectively) with a conditioning variable (Year) and two grouping variables (Country.Name, and CO2.PPP through cexPoints) (Figure 5.8).

Figure 5.8

Figure showing CO2 emissions versus GNI per capita for different intervals of the ratio of co2 emissions to the gdp ppp estimations.

CO2 emissions versus GNI per capita for different intervals of the ratio of CO2 emissions to the GDP PPP estimations.

ggplot(data=CO2data, aes(x=CO2.capita, y=GNI.capita, colour=Country.
    Name)) +
   facet_wrap(~ Year) + geom_point(aes(size=cexPoints), pch=19) +
   xlab(’Carbondioxideemissions(metrictonspercapita)’) +
   ylab(’GNIpercapita,PPP(currentinternational$)’) +
   theme_bw()

The auto.key mechanism of the lattice version is not able to cope with two grouping variables. Therefore, the legend, whose main componens are the labels (intervals) and the point sizes (cex.key), should be defined manually (Figure 5.9).

Figure 5.9

Figure showing CO2 emissions versus GNI per capita for different intervals of the ratio of co2 emissions to the gdp ppp estimations.

CO2 emissions versus GNI per capita for different intervals of the ratio of CO2 emissions to the GDP PPP estimations.

op <- options(digits=2)
tab <- print(intervals)
options(op)

key <- list(space=’right’,
          title=expression(CO[2]/GNI.PPP),
          cex.title=1,
          ## Labels of the key are the intervals strings
          text=list(labels=names(tab), cex=0.85),
          ## Points sizes are defined with cex.key
          points=list(col=’black’, pch=19,
            cex=cex.key, alpha=0.7))
xyplot(GNI.capita ~ CO2.capita|factor(Year), data=CO2data,
      xlab=“Carbondioxideemissions(metrictonspercapita)”,
      ylab=“GNIpercapita,PPP(currentinternational$)”,
      groups=Country.Name, key=key, alpha=0.7,
      col=palOrdered, cex=CO2data$cexPoints) +
   glayer(panel.pointLabel(x, y, labels=group.value,
                      col=palOrdered[group.number], cex=0.7))

5.6 imageTraveling Bubbles

The final solution to display this multivariate time series is with animation via the function grid.animate of the gridSVG package. We will mimic the Trendalyzer/Motion Chart solution, using traveling bubbles of different colors and with radius proportional to CO2.PPP.

The first step is to draw the initial state of the bubbles. Their colors are again defined by the palOrdered palette, although the adjustcolor function is used for a ligther fill color. Because there will not be a legend, there is no need to define class intervals, and thus the radius is directly proportional to the value of CO2data$CO2.PPP.

library(gridSVG)

xyplot(GNI.capita ~ CO2.capita, data=CO2data,
      xlab=“Carbondioxideemissions(metrictonspercapita)”,
      ylab=“GNIpercapita,PPP(currentinternational$)”,
      subset=Year==2000, groups=Country.Name,
      ## The limits of the graphic are defined
      ## with the entire dataset
      xlim=extendrange(CO2data$CO2.capita),
      ylim=extendrange(CO2data$GNI.capita),
      panel=function(x, y, ..., subscripts, groups) {
        color <- palOrdered[groups[subscripts]]
        radius <- CO2data$CO2.PPP[subscripts]
        ## Size of labels
        cex <- 1.1*sqrt(radius)
        ## Bubbles
        grid.circle(x, y, default.units=“native”,
                  r=radius*unit(.25, “inch”),
                  name=trellis.grobname(“points”, type=“panel”),
                  gp=gpar(col=color,
                   ## Fill color ligther than border
                   fill=adjustcolor(color, alpha=.5),
                   lwd=2))
        ## Country labels
        grid.text(label=groups[subscripts],
                x=unit(x, ’native’),
                ## Labels above each bubble
                y=unit(y, ’native’) + 1.5 * radius *unit(.25, ’inch’)
                    ,
                name=trellis.grobname(’labels’, type=’panel’),
                gp=gpar(col=color, cex=cex))
      })

From this initial state, grid.animate creates a collection of animated graphical objects with the result of animUnit. This function produces a set of values that will be interpreted by grid.animate as intermediate states of a feature of the graphical object. Thus, the bubbles will travel across the values defined by x_points and y_points, while their labels will use x_points and x_labels.

The use of rep=TRUE ensures that the animation will be repeated indefinitely.

## Duration in seconds of the animation
duration <- 20

nCountries <- nlevels(CO2data$Country.Name)
years <- unique(CO2data$Year)
nYears <- length(years)

## Intermediate positions of the bubbles
x_points <- animUnit(unit(CO2data$CO2.capita, ’native’),
                 id=rep(seq_len(nCountries), each=nYears))
y_points <- animUnit(unit(CO2data$GNI.capita, ’native’),
                 id=rep(seq_len(nCountries), each=nYears))
## Intermediate positions of the labels
y_labels <- animUnit(unit(CO2data$GNI.capita, ’native’) +
                 1.5 * CO2data$CO2.PPP * unit(.25, ’inch’),
                 id=rep(seq_len(nCountries), each=nYears))
## Intermediate sizes of the bubbles
size <- animUnit(CO2data$CO2.PPP * unit(.25, ’inch’),
                 id=rep(seq_len(nCountries), each=nYears))

grid.animate(trellis.grobname(“points”, type=“panel”, row=1, col=1),
          duration=duration,
          x=x_points,
          y=y_points,
          r=size,
          rep=TRUE)

grid.animate(trellis.grobname(“labels”, type=“panel”, row=1, col=1),
          duration=duration,
          x=x_points,
          y=y_labels,
          rep=TRUE)

A bit of interactivity can be added with the grid.hyperlink function. For example, the following code adds the corresponding Wikipedia link to a mouse click on each bubble.

countries <- unique(CO2data$Country.Name)
URL <- paste(’http://en.wikipedia.org/wiki/’, countries, sep=’’)
grid.hyperlink(trellis.grobname(’points’, type=’panel’, row=1, col
    =1),
            URL, group=FALSE)

Finally, the time information: The year is printed in the lower right corner, using the visibility attribute of an animated textGrob object to show and hide the values.

visibility <- matrix(“hidden”, nrow=nYears, ncol=nYears)
diag(visibility) <- “visible”
yearText <- animateGrob(garnishGrob(textGrob(years, .9, .15,
                                    name=“year”,
                                    gp=gpar(cex=2, col=“grey”)),
                             visibility=“hidden”),
                   duration=20,
                   visibility=visibility,
                   rep=TRUE)
grid.draw(yearText)

The SVG file produced with grid.export is available at the website of the book (Figure 5.10). Because this animation does not trace the paths, Figure 5.5 provides this information as a static complement.

Figure 5.10

Figure showing animated bubbles produced with gridsvg.

Animated bubbles produced with gridSVG.

grid.export(“figs/bubbles.svg”)

Now, sit down in your favorite easy chair and watch the magistral video “200 Countries, 200 Years, 4 Minutes”4. After that, you are ready to open the SVG file of traveling bubbles: It is easier, a short time period with less than twenty countries.

1 http://www.gapminder.org/

2 You should read the Google API Terms of Service before using googleVis: https://developers.google.com/terms/.

3 http://colorbrewer2.org/

4 http://www.gapminder.org/videos/200-years-that-changed-the-world-bbc/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.36.71