Chapter 10

Comparisons, Comparisons, Comparisons

Shall I compare thee to a summer's day?

William Shakespeare (Sonnet 18)

Summary

Chapter 10 covers making comparisons using graphics, complementing more formal statistical methods of comparison.

10.1 Introduction

Many scientific studies are carried out in a two-group design using a treated group and a control group. The groups are then commonly compared on the basis of the difference between their two means, using a two-sample t-test, assuming that the samples are drawn from normal distributions with equal variances. There may be other interesting features in the data, which cannot be formally compared because the sample sizes are too small. This should not stop us from looking at the data, and perhaps plotting the data should be compulsory if a t-test is carried out, just to remind people of the assumptions they are implicitly making and to make them think about what features they might be missing.

Many different features might be seen by looking at the data underlying t-tests. There may be evidence of differing variability, there may be outliers, either in relation to the whole dataset or in relation to one of the groups, there may be clustering in the data, there may be skewness, there might be favoured values. A t-test tells you nothing about any of these features and when the sample sizes are small, the situation the test was originally developed for by Gosset, there is too little data to be sure whether such features exist. As sample sizes get larger, it makes sense to look at more than just the difference between the means. Graphics and statistics are complementary approaches.

The dataset bank in the package gclus concerning forged and genuine Swiss banknotes was introduced in §5.6. Carrying out t-tests of the differences in means for the various banknote measurements comparing the genuine and forged notes gives a range of significances. Tests on the variables Diagonal and Right both give p-values of < 2.2 × 10-16. (This p-value limit may vary from machine to machine.)

The left-hand side of Figure 10.1 shows histograms of the Diagonal values for the two groups of notes and the right hand side shows histograms for the variable Right. The groups barely overlap for the Diagonal measurements, although there is one outlier amongst the genuine notes. On the other hand, the distributions of the Right measurements overlap a lot and it is mildly surprising that the means are so significantly different.

data(bank, package="gclus")
bank <- within(bank,
   st <- ifelse(Status==0,"genuine","forgery"))
c1 <- ggplot(bank,aes(x=Diagonal)) +
    geom_histogram(binwidth=0.2) + facet_grid(st~.)
c2 <- ggplot(bank, aes(x=Right)) +
    geom_histogram(binwidth=0.1) +facet_grid(st~.) 
grid.arrange(c1, c2, ncol=2)

Figure 10.1

Figure showing histograms of the Diagonal measurements on the left and the Right measurements on the right for the forgery and genuine groups in the Swiss banknote dataset. t-tests confirm that the differences in means between the two groups are highly significant. The graphics show that the distribution patterns differ.

Histograms of the Diagonal measurements on the left and the Right measurements on the right for the forgery and genuine groups in the Swiss banknote dataset. t-tests confirm that the differences in means between the two groups are highly significant. The graphics show that the distribution patterns differ.

10.2 Making comparisons

“At the heart of quantitative reasoning is a single question: Compared to what?“ [Tufte, 1990]. Numbers are of little use on their own, they have to be compared. In practice you are always making comparisons when you look at graphics, sometimes with explicit formal hypotheses, sometimes with implicit expectations.

If you are surprised by what you see, it is doubtless because you had anticipated seeing a different picture. For instance, a scatterplot with a completely random pattern can be very informative, if you believed that the data would be strongly correlated. An absence of information can also be information. In Conan Doyle's story “Silver Blaze”, it was the fact that the dog did not bark in the night that Sherlock Holmes found curious. It is a good idea to think about what a graphic might look like before you draw it. We are all experts at explaining what we see—once we have the display in front of us.

Sometimes you compare new data to accepted standards, sometimes you compare new data with old data, mostly you compare data from different groups within the same dataset. Which comparisons you make, and why, can be crucial to the success of an analysis and there are often very many comparisons that might be considered. In a study of fuel use of cars you could compare this year with last year or earlier years, you could compare use in town traffic with motorway use, you could compare cars of different sizes and of different manufacturers.

There is also the issue of how you make the comparisons. Do you compare values or, if suitably paired data are available, study differences? And if you study differences, do you consider absolute differences or relative differences?

Groups may also be compared with descriptive statistics like medians, maxima, other quantiles, ranked values, or measures of variability. Medians might be used in comparing incomes, minima in comparing lap times, quantiles or ranked values in comparing top groups, and so on. Often the aim is to detect differences between groups, although sometimes it is to see if groups can be taken to be sufficiently similar to one another. For instance, an analysis of variance requires that the variability in each group should be approximately the same.

Of course, graphics are only part of the story; statistical comparisons have to be made. Usually means are compared, and the central limit theorem is often used, rightly or wrongly, as an all-embracing justification for that kind of testing. Alternatively non-parametric tests may be a possibility. Graphically there is more flexibility and it is possible to compare whole distributions, although there is less objectivity. One person's “it is obvious from the graphic that...” may be another person's “that could have arisen by chance and should be checked”.

Types of comparison

With categorical data such as the Titanic dataset, discussed in Chapter 7, the main statistics for comparisons are rates and percentages. You might want to compare:

  1. male and female survival rates;
  2. the survival rate of the passengers compared to that of the crew;
  3. survival rates between the three passenger classes;
  4. survival rates by sex within class or by class within sex;
  5. the survival rates with those of other sinkings (there has been research comparing the Titanic and Lusitania disasters [Frey et al., 2011]).

With all these possibilities it is important to choose appropriate comparisons. Given the survival rate of male adults in the third class, you could compare it with the rate for female adults in the third class or with the rate for male adults in the other classes. If there had been more male children on board you could further compare the male adult survival rate in the third class with the male child survival rate in the third class.

Comparisons may be

specific such as the male adult survival rate in the third class compared to the female adult survival rate in the third class;

general such as comparing the male adult survival rate in the third class to the overall survival rate (which includes that subgroup) or to the survival rate for all others;

at different levels comparing males and females, comparing male adults and female adults, comparing male adults and female adults in the third class.

Each comparison requires a different model with different standard errors and the interpretation of the results can be tricky given the large number of alternatives available for what you can compare.

In this chapter the emphasis is on using graphics to make comparisons. With formal statistical comparisons you have an explicit statement of what is to be compared. With graphical comparisons you decide what is important depending on what you see and on what your expectations were. In effect you may be unconsciously making many implicit comparisons, deciding what is worth looking at in more detail and what not. People may have different opinions of what they can see in a plot, although hopefully they will agree on the major features.

Comparing like with like

Comparisons should be fair, and that requires paying careful attention to several issues. Any differences identified should be due to the factor you want to investigate.

Comparable populations Samples provide information on the populations they are drawn from. Mortality and morbidity statistics are often reported as ‘age-adjusted' to try to make them comparable across populations with different age structures.

Comparable variables In surveys, questions on the same topic may be framed in quite different ways. In business, markets may be defined differently by different companies, so that their sales figures are difficult to compare.

Comparable sources Data from different sources may be collected in different ways according to different rules. Opinion polls from different firms can differ systematically. Drink-driving laws and how they are implemented may vary widely between countries.

Comparable groups The gold standard in clinical research is the randomised control trial. Participants are allocated to treatments at random and, where possible, neither they nor the researchers know which treatment they receive (double-blind). This should go some way to ensuring that the groups are similar in all but the treatment they receive.

Comparable conditions If the effects of a particular factor are to be compared, then we need to ensure that the effects of all other possible influences are as far as possible eliminated. In the traditional discussions of the Hawthorne experiment it is concluded that the improvements in performance were probably due to the participants' being observed, not a factor the experimenters had originally taken into account.

Comparable measurements School marks in the United Kingdom are usually on a scale of 0 to 100. In Germany schoolchildren get grades from 6 (worst) to 1.

Standardising data Sometimes data are standardised to allow comparisons. In the decathlon event athletes' performances in each of ten disciplines are converted to point scores according to internationally agreed formulae, so that high jump performances can be compared with shotput performances, with sprint times and so on. Financial data collected over time are usually adjusted for inflation.

10.3 Making visual comparisons

Comparing to a standard

Michelson's measurements of the speed of light (used in Exercise 4 in Chapter 1) are shown in Figure 10.2 with a red dotted line added, showing the current estimate of the speed of light adjusted for travelling through air. Although the range of values obtained by Michelson contains the ‘true' value, a 95% confidence interval based on these data does not. Using

data(Michelson, package="HistData")
tc <- t.test(Michelson, mu=734.5)

gives a confidence interval of 836.72 to 868.08.

ggplot(Michelson, aes(x=velocity)) + geom_bar(binwidth = 25) +
  geom_vline(xintercept = 734.5, colour="red",
  linetype = "longdash") +
  xlab("Speed of light in kms/sec (less 299,000)")

Figure 10.2

Figure showing michelson′s data for the speed of light from 1879. Most of his observations were higher than the relevant currently accepted value, which would be 734.5 on this scale (marked in red).

Michelson′s data for the speed of light from 1879. Most of his observations were higher than the relevant currently accepted value, which would be 734.5 on this scale (marked in red).

Comparing new data with old data

There are at least two datasets with petrol consumption data for cars in R, mtcars with information on 32 models from 1973-4 and Cars93 from the MASS package with information on 93 car models on sale in the USA in 1993 (also used in §5.4). Figure 10.3 compares a histogram of the miles per gallon reported for the cars in the earlier dataset with a histogram for the miles per gallon in city driving reported for the cars in the later dataset. The horizontal scales have been chosen to be comparable and the histograms have been drawn one above the other rather than side by side for easier comparison of the distributions. Even though city driving is more demanding than driving overall, the cars in the later dataset perform a little better.

data(Cars93, package="MASS")
c1 <- ggplot(mtcars, aes(mpg)) + geom_bar(fill="blue") + 
    xlim(10,50) + xlab("mpg for 32 cars from 1973-4")
c2 <- ggplot(Cars93, aes(MPG.city)) +
   geom_bar(fill="red") + xlim(10,50) +
   xlab("mpg in city driving for 93 cars from 1993") 
grid.arrange(c1, c2, nrow=2)

Figure 10.3

Figure showing comparing miles per gallon distributions for cars from 1973-4 and 1993. The later cars appear to have a slightly better performance and it should be taken into account that the mpg figures for them are only for city driving.

Comparing miles per gallon distributions for cars from 1973-4 and 1993. The later cars appear to have a slightly better performance and it should be taken into account that the mpg figures for them are only for city driving.

A comparison of the means with a t-test

tf <- t.test(Cars93$MPG.city, mtcars$mpg)

shows that the means are not significantly different, with a p-value of 0.067 and a 95% confidence interval of -0.16 to 4.71.

Comparing subgroups

Figure 10.1 showed comparisons of two of the variables from the Swiss Banknote dataset. Histograms are effective for comparing two groups and can reveal a lot of data details. They are less effective when there are several groups.

Boxplots work well for more groups. Figure 10.4 displays the palmitic variable for the nine areas in the olives dataset, which was also used in §9.3. Having the area labels directly beneath their corresponding boxplots helps a lot, as does colouring by region. For presentation purposes it could be worth ordering the areas by their medians or ordering them within the three regions by their medians.

ggplot(olives, aes(Area, palmitic, colour=Region)) + 
    geom_boxplot() + theme(legend.position="bottom")

Figure 10.4

Figure showing boxplots of the palmitic data by area for the olives dataset. Differences in levels and variability as well as some outliers can be observed. There is much more variability in the Southern region than in the other two regions, both between and within areas.

Boxplots of the palmitic data by area for the olives dataset. Differences in levels and variability as well as some outliers can be observed. There is much more variability in the Southern region than in the other two regions, both between and within areas.

Density estimates or possibly distribution functions can be used too, though for a smaller number of groups. Figure 10.5 shows boxplots, density estimates, and empirical distribution functions of the palmitic variable for the three regions in the olives dataset. Each display picks out the main feature (that the values for the Southern region are generally higher than those for the other two regions). The boxplots and density estimates are more informative than the empirical distribution functions.

Figure 10.5

Figure showing three alternative displays of the distributions of the palmitic variable for the three regions in the olives dataset. The boxplots on the left emphasise the outliers and show that the olive oils from the South have higher values than the oils from the other two regions, while the values for the Sardinian region are very close together. The density estimates in the upper right plot show much the same information, although with less emphasis on the outliers, and reveal additionally that the distribution for the Southern region is skewed towards lower values. The empirical distribution functions in the lower right plot do not show much more than that the Southern region has higher values.

Three alternative displays of the distributions of the palmitic variable for the three regions in the olives dataset. The boxplots on the left emphasise the outliers and show that the olive oils from the South have higher values than the oils from the other two regions, while the values for the Sardinian region are very close together. The density estimates in the upper right plot show much the same information, although with less emphasis on the outliers, and reveal additionally that the distribution for the Southern region is skewed towards lower values. The empirical distribution functions in the lower right plot do not show much more than that the Southern region has higher values.

If you amend the code above using Area instead of Region you can see how the comparisons of the nine areas would look using these plots. Although some features stand out, there are too many groups for an unambiguous allocation of colours, so that it is not always easy to tell which feature belongs to which area.

o1 <- ggplot(olives, aes(Region, palmitic, colour = Region)) + 
    geom_boxplot() + theme(legend.position = "none")
o2 <- ggplot(olives, aes(x=palmitic, colour = Region)) + 
    geom_density() + ylab("density") + 
    theme(legend.position = "none")
o3 <- ggplot(olives, aes(x=palmitic, colour = Region)) + 
    stat_ecdf() + ylab("cdf") +
   theme(legend.position = "bottom") 
grid.arrange(o1, arrangeGrob(o2, o3, nrow=2),
    ncol=2, widths=c(1, 2))

Comparing time series (Playfair's import/export data)

William Playfair drew many impressive displays, including several showing England's imports and exports with other countries. The data for England's trade with the East Indies between 1700 and 1780 have been estimated [Bissantz, 2009] from the graphic in the first edition of Playfair's “Commercial and Political Atlas“, published in 1786. The corresponding graphic in the third edition [Playfair, 2005] goes up to 1800.

There are several ways imports and exports could be compared over the years. Playfair plotted both series in the same display and coloured the area between them to give some idea of the balance of trade. The top plot of Figure 10.6 is a redrawing of his plot.

Cleveland suggested plotting the difference between imports and exports [Cleveland, 1994] and this is shown in the middle plot of Figure 10.6.

Figure 10.6

Figure showing england′s trade with the East Indies in the eighteenth century. The top plot is a redrawn version of Playfair′s plot showing that imports were always higher than exports. The middle plot shows the balance of trade and highlights a dip in the 1760′s, which is hard to see in the top plot. The lowest plot shows the relative balance of trade and suggests that recent deficits are lower, compared to the deficits in the twenty years around 1720.

England′s trade with the East Indies in the eighteenth century. The top plot is a redrawn version of Playfair′s plot showing that imports were always higher than exports. The middle plot shows the balance of trade and highlights a dip in the 1760′s, which is hard to see in the top plot. The lowest plot shows the relative balance of trade and suggests that recent deficits are lower, compared to the deficits in the twenty years around 1720.

You might also consider plotting the relative difference rather than the absolute difference and this has been done in the final plot of Figure 10.6, where the relative difference has been calculated by dividing the difference by the average of Imports and Exports. Depending on the goals of the analysis, either Imports or Exports alone could have been taken as the base.

Modern economic analysts would probably consider adjusting for inflation. There are estimates of eighteenth century inflation available by year (e.g., [James, 2014]), but the cumulative effects are on a much smaller scale than the differences observed here.

This historical context of these data should not be ignored. The period includes, amongst other events, the War of the Spanish Succession (1701-14), the South Sea Bubble (1720), the Irish Famine due to the “Great Frost” (1740-41), the Seven Years' War (1756-63), and the American Revolutionary War (1775-83).

data(EastIndiesTrade,package="GDAdata")
c1 <- ggplot(EastIndiesTrade, aes(x=Year, y=Exports)) + 
    ylim(0,2000) + geom_line(colour="red", size=2) + 
    geom_line(aes(x=Year, y=Imports),
    colour="yellow", size=2) + 
    geom_ribbon(aes(ymin=Exports, ymax=Imports),
    fill="pink",alpha=0.5) +
    ylab("Exports(red) and Imports(yellow)")
c2 <- ggplot(EastIndiesTrade, aes(x=Year,
    y=Exports-Imports)) + geom_line(colour="green")
c3 <- ggplot(EastIndiesTrade, aes(x=Year,
    y=(Exports-Imports)/((Exports + Imports)/2))) + 
    geom_line(colour="blue")
grid.arrange(c1, c2, c3, nrow=3)

10.4 Comparing group effects graphically

The famous barley dataset was considered briefly in §8.5. The aim of the study was to compare ten varieties of barley by looking at yields in two successive years at each of six testing station sites. The yields for the two years are a little different, with those for 1931 looking higher than those for 1932, as can be seen in Figure 10.7.

data(barley, package="lattice")
ggplot(barley, aes(yield)) + geom_histogram(binwidth=5) + 
  ylab("") + facet_wrap(~year, ncol=1)

Figure 10.7

Figure showing histograms of yields for the barley dataset in the lattice package for the years 1931 and 1932. The values for the earlier year look higher on average with the distribution appearing to be shifted to the right compared to the following year.

Histograms of yields for the barley dataset in the lattice package for the years 1931 and 1932. The values for the earlier year look higher on average with the distribution appearing to be shifted to the right compared to the following year.

It is the differences in yield by variety that are mainly of interest and drawing ten histograms would not be practical, even if there was enough data to justify it. Figure 10.8 shows parallel dotplots of the 12 values for the ten varieties in the upper plot and confidence intervals for the variety means in the lower plot. The plots suggest that there is little difference between the varieties. Note the use of the %>% operator from the magrittr package to build a sequence of operations in a readable order. magrittr is imported through loading the dplyr package.

c1 <- ggplot(barley, aes(x=variety, y=yield)) + 
    geom_point() + ylim(10,70)
barl1 <- barley %>% group_by(variety) %>% 
    summarise(N = n(), mean = mean(yield),
    sd = sd(yield), se = sd/sqrt(N))
lims <- aes(ymax = mean + 2*se, ymin=mean - 2*se)
p1 <- ggplot(barl1, aes(x=variety, y=mean)) + 
   geom_point() + ylim(10,70) + 
   geom_errorbar(lims, width=0.2) 
grid.arrange(c1, p1)

The displays in Figure 10.8 treat all values equivalently, ignoring both the sites and the years. Figure 10.9 shows confidence intervals for yields by year for each site. It is obvious that there are clear differences between the sites and between the years within the sites. The odd pattern for the Morris site that Cleveland remarked on, with the data possibly reversed for the two years, can also be seen.

Figure 10.8

Figure showing yields for the barley dataset by variety. The individual data points for each variety are shown in the upper plot and confidence intervals for the means in the lower one (the intervals are ±2SE). The vertical axis scales have been chosen to include all the data and to be the same for both plots. The varieties appear to produce similar yields.

Yields for the barley dataset by variety. The individual data points for each variety are shown in the upper plot and confidence intervals for the means in the lower one (the intervals are ±2SE). The vertical axis scales have been chosen to include all the data and to be the same for both plots. The varieties appear to produce similar yields.

Figure 10.9

Figure showing confidence intervals for mean barley yields by year for each site. The data for 1931 are lower than for 1932 at all sites barring Morris.

Confidence intervals for mean barley yields by year for each site. The data for 1931 are lower than for 1932 at all sites barring Morris.

The mutate function is used to put the two years in ascending chronological order, which looks more natural. It appears that the levels for the three variables, variety, year, and site, in the dataset in lattice have been ordered by increasing yields. You can confirm this by drawing boxplots of yield by each of the variables.

barl2 <- barley %>%
  mutate(Year = factor(year,
    levels = c("1931", "1932"))) %>% 
  group_by(site, Year) %>%
  summarise(N = n(), mean = mean(yield), 
  sd = sd(yield), se = sd/sqrt(N))
lims <- aes(ymax = mean + 2*se, ymin=mean - 2*se) 
ggplot(barl2, aes(colour=Year, x=site, y=mean)) +
  geom_point() + geom_errorbar(lims, width=0.2) +
  ylim(10,70) + theme(legend.position = ‘bottom’)

To identify differences between the varieties, both site and year have to be taken into account to make an effective comparison. A linear model shows that variety is indeed significant and Figure 10.10 shows the interval estimates for the variety coefficients for this model. The default in R is to set the first coefficient to 0, so the plot shows that significance is primarily due to the last two varieties (Wisconsin No. 38 and Trebi) being higher than the first (Svansota).

m1 <- lm(yield~site+year+variety, data=barley) 
library(coefplot)
coefplot(m1, predictors="variety", lwdOuter=1) + ggtitle("") + 
  ylab("") + xlab("yield difference from Svansota")

Figure 10.10

Figure showing interval estimates (±2s.e.) for the coefficients from a linear model of yield for the barley dataset. The top two varieties have coefficients that are obviously higher than 0, the value for the Svansota variety (not shown). The other varieties have positive coefficients, although not clearly higher than 0.

Interval estimates (±2s.e.) for the coefficients from a linear model of yield for the barley dataset. The top two varieties have coefficients that are obviously higher than 0, the value for the Svansota variety (not shown). The other varieties have positive coefficients, although not clearly higher than 0.

10.5 Comparing rates visually

As Chapter 7 shows, for example in Figure 7.1, shading bars of the same height but different widths according to subgroup proportions is a good way of comparing rates and the biggest groups get the most attention, being the widest bars. However, there is no information on the possible significance of any differences between the proportions. A statistical alternative would be to plot confidence intervals for the proportions, treating them as estimates of unknown parameters.

Besides the advantage of offering some statistical guidance, there are two disadvantages, one major and one minor that have to be borne in mind. The minor disadvantage is that the larger groups will have smaller intervals, thus attracting less attention. The major disadvantage is that individual confidence intervals will not necessarily help assess the significance of the particular comparisons you might wish to make.

Figure 10.11 shows confidence intervals for male survival rates by class for the Titanic, assuming independent binomial distributions for each class. It is an alternative display for the right-hand plot in Figure 7.1, but only for the males. The differences between the survival rates for the male first-class passengers and the other male groups are obviously significant; the significances of the other differences are unclear.

The code for Figure 10.11 constructs the intervals by hand, so to speak. A new dataset for just the males is defined. Then the necessary marginal totals are calculated for finding the rates and from them the limits are calculated.

t1 <- data.frame>(Titanic) 
t1m <- t1[t1$Sex=="Male",]
xt1 <- xtabs(Freq ~ Class, data=t1m[t1m$Survived=="Yes",])
xt2 <- xtabs(Freq ~ Class, data=t1m)
surv <- xt1/xt2
survS <- (surv*(1-surv)/xt2)^0.5
su <- data.frame(cbind(surv, survS))
su$Class <- rownames(su)
lims <- aes(ymax = surv + 2*survS, ymin=surv - 2*survS)
ggplot(su, aes(x=Class, y=surv)) + geom_point() +
  geom_errorbar(lims, width=0.1, colour="blue") +
  ylim(0,0.5) + ylab("Male survival rate")

Figure 10.11

Figure showing approximate confidence intervals for male survival rates by class for the Titanic dataset. The first-class males had significantly better survival chances than the other males.

Approximate confidence intervals for male survival rates by class for the Titanic dataset. The first-class males had significantly better survival chances than the other males.

Model-based confidence intervals for the rates could be calculated using logistic regression models of the males' data and the results would be roughly the same. The dataset would have to be rearranged first and after the model fitting the results converted to probability scales. Other logistic models could be fit as well, modelling the females as well as the males. A model with the interaction of Class and Sex then gives the same results, while a model with only the additive terms leads to different results. With displays of interval estimates you always have to know how the estimates were calculated.

library(reshape2)
t1c <- dcast(t1m, Class+Sex~Survived, sum)
m1c <- glm(cbind(Yes, No) ~ Class,
   family = binomial, data=t1c)
p1c <- predict(m1c, se.fit=TRUE)
lowc <- p1c$fit - 2*p1c$se.fit
highc <- p1c$fit + 2*p1c$se.fit
estp <- 1/(1+exp(-p1c$fit))
lowp <- 1/(1+exp(-lowc))
highp <- 1/(1+exp(-highc))

Alternatively, you could plot coefficient interval estimates from a model using coefplot, for instance with the following code. This uses a model form which includes no intercept, so that all the coefficients of interest are included.

t1e <- dcast(t1, Class+Sex~Survived, sum)
 m1f <- glm(cbind(Yes, No) ~ 0+Class*Sex,
   family = binomial, data=t1e) 
coefplot(m1f) + xlab("") + ggtitle("")

Rates can possibly depend on many, many factors. How baseball batting averages may depend on situational factors is discussed in [Albert, 1994], where eight factors including home/away, day/night game, groundball/flyball pitcher, scoring position/none are considered, and even more are suggested. Specific hypotheses require the appropriate underlying models for drawing intervals.

10.6 Graphics for comparing many subsets

Trellis displays (cf. §8.5) are an effective means of looking at displays of many subsets at the same time. Big differences for particular subsets stand out; smaller differences are more difficult to spot. Rearranging the order of the panels can help. Figure 10.12 is a trellis display for the olive oil dataset showing scatterplots of the variables palmitic and palmitoleic for the nine areas.

data(olives, package="extracat")
ggplot(olives, aes(palmitic, palmitoleic)) + 
  geom_point() + facet_wrap(~Area)

Figure 10.12

Figure showing a trellis display of scatterplots for two of the variables in the olives dataset. The data are in close clusters for several of the areas in different parts of the plot. In two of the areas, Sicily and South Apulia, there are strong linear associations.

A trellis display of scatterplots for two of the variables in the olives dataset. The data are in close clusters for several of the areas in different parts of the plot. In two of the areas, Sicily and South Apulia, there are strong linear associations.

Trellis graphics can be drawn in R using the lattice package or the facetting options in ggplot2. It is interesting to compare the trellis graphic with a single scatterplot of the two variables in which the areas have been allocated different colours, Figure 10.13. Some features are easier to see, others not.

ggplot(olives, aes(palmitic, palmitoleic)) + 
  geom_point(aes(colour=Area)) + 
  theme(legend.position = "bottom") + 
  guides(col = guide_legend(nrow = 2))

Figure 10.13

Figure showing a single scatterplot for the data from Figure 10.12 with the points coloured by the variable Area. Overall there is a linear relationship between the variables palmitic and palmitoleic driven by the South Apulia region. Although the areas are separated to some extent, the density of points, overplotting, and the difficulty of distinguishing some of the colours make detailed interpretation tricky.

A single scatterplot for the data from Figure 10.12 with the points coloured by the variable Area. Overall there is a linear relationship between the variables palmitic and palmitoleic driven by the South Apulia region. Although the areas are separated to some extent, the density of points, overplotting, and the difficulty of distinguishing some of the colours make detailed interpretation tricky.

With interactive graphics you can select individual groups in a scatterplot and see them highlighted, while leaving the other cases in the background, providing context. A similar effect, but for all groups simultaneously, can be achieved by drawing a trellis plot with each panel displaying both its particular group highlighted in the foreground and also all the other cases in grey in the background. Figure 10.14 demonstrates the idea. It uses the function facetshade from the extracat package, which supplies the necessary additional facetting option for ggplot2 objects. The alpha function from the scales package is used to set the alpha-blending of the points in the background.

Figure 10.14

Figure showing another version of the trellis display in Figure 10.12. All cases are drawn in every panel with those for the corresponding subgroup drawn in colour on top and the remaining cases drawn in grey with alpha-blending underneath. This kind of plot shows the subgroups in context and makes it easier to assess the relative shapes of the subgroups. The West-Liguria oils have slightly higher palmitoleic values than the others and the East-Liguria outlier is an outlier for the whole dataset.

Another version of the trellis display in Figure 10.12. All cases are drawn in every panel with those for the corresponding subgroup drawn in colour on top and the remaining cases drawn in grey with alpha-blending underneath. This kind of plot shows the subgroups in context and makes it easier to assess the relative shapes of the subgroups. The West-Liguria oils have slightly higher palmitoleic values than the others and the East-Liguria outlier is an outlier for the whole dataset.

It is now easier to judge the relative positions of the groups compared to the rest of the data than it was in Figure 10.12. A similar approach was used with some of the parallel coordinate plots in Chapter 6.

Figures 10.12, 10.13, and 10.14 offer three different ways of comparing scatter-plots by groups. Personally I prefer Figure 10.14 to Figure 10.12, especially when alpha is chosen carefully, which may require a little experimentation. Figure 10.13 and Figure 10.14 are alternatives for different situations. The former is fine for well separated groups and when little display space is available (for whatever reason). The latter is better, when groups overlap. As always with facetting with no predefined order (the default is alphabetic), it is worth giving some thought as to whether another ordering might be more informative. For this dataset a grouping by the variable Region would make sense.

10.7 Graphics principles for comparisons

Graph size Each graphic in a comparison must be drawn to the same size and aspect ratio. Comparing graphics of different sizes is possible, just more difficult.

Common scaling The importance of using the same scales for groups being compared, common scaling, has been mentioned several times. It is easy to forget when using default options for displays. In general, both the horizontal and vertical axes should be the same for all graphics to be compared, although, as so often, the exception proves the rule. Histograms of two groups of different sizes may be better drawn with their own vertical axes if it is the form of the distribution that is to be compared rather than the data frequencies.

Alignment Graphics to be displayed can be aligned vertically, which is good for comparing variability and form, or aligned horizontally, which is good for comparing the levels of peaks and troughs. Probably it is best to look at both. Either is much better than trying to compare without alignment.

Single and multiple windows Plotting all groups in a single window can aid or hinder comparison. In a single graphic well-separated groups can easily be distinguished, whereas overlapping groups can be difficult to detect. Small multiples, with one display for each group, are the better alternative for overlapping groups. It can be helpful to draw the rest of the cases in the background to provide context as shown in Figure 10.14.

data(olives, package="extracat")
library(scales)
fs1 <- facetshade(data = olives, 
 aes(x = palmitic, y = palmitoleic), f = .~Area)
fs1 + geom_point(colour = alpha("black", 0.05)) +
 geom_point(data = olives, colour = "red") +
 facet_wrap(f=~Area, nrow=3) + theme(legend.position="none")

Colour and shape Distinguishing groups is helped considerably by using unique colours for each group. Shape and size are sometimes used for points, but colour is much more effective.

What is being compared A feature in a plot may subconsciously be compared with the rest of the data, with another part of the data, with several other parts of the data, with all the data, with the data that are graphically visible (when there are overlapping points, for instance), or even with remembered data that are not shown. If something interesting is found, it is valuable to determine which of these comparisons led to your thinking it was ‘interesting'.

10.8 Modelling and testing for comparisons

  1. Comparing means

    The default for comparing two means is the t-test. Even if the necessary assumptions are not ideally met, very low p-values are good evidence for real differences. You just have to decide what ‘very low' means, although presumably everyone would agree that the p-values reported in §10.1 are.

  2. More complex comparisons

    When additional factors are taken into account, comparisons are made using linear models. Testing should then take into account the large number of tests that are carried out.

  3. Comparing rates

    Logistic regression is used for comparing two rates while including the possible influences of other explanatory variables. Proportional odds models may be used when there are ordinal rates.

  4. Non-parametric approaches when standard assumptions do not hold

    There is a range of non-parametric tests which may be applied, including Wilcoxon for comparing two means and Kruskal-Wallis for non-parametric analysis of variance.

Multiple testing Looking for several kinds of difference at once amongst many different subgroups means that something of potential interest is bound to appear. This can even happen when looking at differences in rates between only a few groups: given m groups there are at least (m2) possible paired comparisons, and still more if comparisons of combinations of groups are included. The pragmatic approach is to examine the most extreme differences first and to bear in mind that the significance of any supporting statistical tests carried out will be affected by multiple testing.

There are formal methods for counteracting explicit multiple testing, such as Bonferroni corrections and FDR (false discovery rates). Details can be found in [Bretz et al., 2010] and [Benjamini, 2010]. These approaches do not cover graphical analyses. There are so many features that might be seen that you cannot be sure just how many tests are implicitly being carried out.

Main points

  1. There is more to comparing groups than comparing their means. Quantiles, variability, and distributional patterns may also be compared—at least graphically (§10.1).
  2. Fair comparisons need comparable data (populations, sources, variables, measurements, groups) and comparable graphics (size, aspect ratio, scaling, alignment) (§10.2).
  3. Different comparisons show different aspects of datasets (Figure 10.6).
  4. Comparisons may be made at different levels. Selecting the right comparison requires careful thought. Choosing appropriate conditioning variables is often important (Figures 10.8 and 10.9).
  5. Appropriate intervals are needed for testing differences. Each comparison of interest may require its own model (§10.5).
  6. Trellis scatterplots in which the rest of the data are plotted in grey in the background are effective for comparing overlapping subgroups (Figure 10.14).

Exercises

  1. Swiss banknotes

    Consider the bank dataset from the gclus discussed in §10.1.

    1. (a) How do the distributions of the variable Length differ for the two groups defined by the Status variable? Draw histograms, boxplots, and empirical distribution functions. Which display do you find most informative? What are the advantages and disadvantages of the three displays in this application?
    2. (b) Are the mean lengths of the two groups significantly different?
  2. Petrol consumption

    In §10.3, the petrol consumption of cars was compared for two datasets using miles per gallon figures. In most European countries, consumption is measured the other way round, as litres needed to drive 100 kilometres.

    1. (a) Draw comparative plots of petrol consumption, measured in gallons needed to drive 100 miles, for the two datasets. What features, if any, are notable in the plots?
    2. (b) Carry out a t-test comparing the two means. Discuss your result in conjunction with the result of the t-test carried out in §10.3.
    3. (c) A major influence on petrol consumption is the weight of a car. Draw scatterplots of MPG.city and 1/MPG.city against Weight for the Cars93 dataset. What conclusion do you draw and which scatterplot do you prefer?
  3. Barley (corrected?)

    Cleveland suggested that some of the data in the barley dataset analysed in §10.4 is possibly wrong. Construct a new version by switching the data for the Morris site for the two years.

    1. (a) Draw revised versions of the first two figures from the chapter. How different are your plots compared to the original ones?
    2. (b) Refit the linear model and plot the interval estimates. Do the conclusions about the differences between the varieties change with this revised dataset?
  4. Balance of trade

    England's balance of trade with the East Indies was discussed in §10.3.

    1. (a) The relative balance of trade was calculated using the average of imports and exports. How would that graph look if you used (i) imports and (ii) exports as the denominator? How would you interpret the graphs and what headlines would you give them in a newspaper article?
    2. (b) What is the relation of imports to exports between individual countries nowadays? Find data for your own country and draw time series of the balance of trade between it and its three most important partners over the last twenty years. How have you defined “most important”?
  5. Diamonds

    The diamonds dataset from the ggplot2 package includes information on over 50,000 round cut diamonds.

    1. (a) The variable color has categories ranging from D (best) to J (worst). Draw a plot showing how price varies with color. Are you surprised by the result? What might explain it?
    2. (b) Draw a plot of the color coefficient estimates with 95% confidence intervals. What conclusions would you draw?
  6. Swiss banknotes (again)

    Consider the variables Right and Left, measurements of the edge widths of the notes.

    1. (a) What do the distributions of the differences between these measurements for each note look like for the two groups? Are the differences significantly different from zero?
    2. (b) The measurements Bottom and Top for the margin widths might also be expected to be close to equal for each note. Are they and does the difference relate to the edge width differences?
    3. (c) Instead of using absolute differences, proportionate differences could be used. Draw a plot to compare the scales of the proportionate differences for the edges and margins. What denominator would you suggest? Do you think the data are reported precisely enough for these analyses?
  7. Olkin95

    There are data on 70 different studies of thrombolytic therapy after acute myocardial infarction in the Olkin95 dataset in the meta package. (This dataset was also used in Exercise 2 in Chapter 5.)

    1. (a) Plot the event rates for the experimental groups against the corresponding rates for the control groups. What does your plot show?
    2. (b) The sizes of the studies should also be taken into account. Draw a scatterplot of the rate differences in each study against the size of the study, using the total number of participants for the size. (This is a kind of funnel plot.) What conclusions would you draw from your plot? How much does it matter, if at all, that the experimental and control groups are not always the same size?
  8. Intermission

    The Judgement of Paris by Rubens hangs in the National Gallery in London. How would you compare it with paintings of the same scene by, amongst others, Cézanne, Renoir, Raphael, and Watteau?

  9. Intermission (extended)

    Manet's painting Le déjeuner sur l'herbe is in the Musée d'Orsay in Paris. Monet painted a version a couple of years later and Picasso was inspired to make over one hundred drawings and some twenty-seven paintings of the scene. How do the versions by Picasso and Monet compare with Manet's?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.42.134