CHAPTER 16

image

Modern Statistical Methods II

In this chapter, we explore more of what might be termed modern statistics. In particular, we will explore nonparametric tests and bootstrapping. As with many things, it helps to explore the reasons why such modern methods might be useful in a little depth. We have organized this chapter into three main parts. The first delves philosophically into the reasoning behind the need for these methods, while again avoiding as much mathematics as possible. The second section covers our familiar process of methodologically introducing the packages and function calls for several useful nonparametric tests. Finally, we investigate bootstrapping, with examples highlighting the historically unprecedented power available now through modern computing.

16.1 Philosophy of Parameters

So far, we have primarily focused on analyses that supposed some parameterization of our data. In Chapter 10, we tended to care about the distribution of our data. So for the t-test(s) that we performed, it was a requirement or assumption that our data values followed a normal distribution, that the variance of our data followed a chi-squared distribution, and that our data would be independently sampled. In Chapter 12, we explored analysis of variance (ANOVA) and introduced the Shapiro-Wilk (SW) normality test function of shapiro.test(). We also introduced the bartlett.test() function for homoscedasticity. By Chapter 13, we were looking at Normal Q-Q plots to inform as to whether the residuals followed a normal distribution. Thus, so far, most of our statistical analyses have had a requirement of several preconditions or assumptions or parameters. The approach used has been to use exploratory data analysis (EDA), find a distribution that matched, and then run a test whose assumptions could be satisfied. Along the way, we sometimes mentioned that some of these analyses were robust enough to cope with minor deviations from their assumptions provided their results were interpreted accordingly.

This chapter explores methods that do not simply relax the normality (or other distribution) assumptions—we may remove them entirely! This is termed nonparametric statistics. This becomes important when we have no reason to make parametric assumptions about data, or in fact the data may clearly violate assumptions of standard models. A case in point might be if the Normal Q-Q plot were to show that the residuals of our linear model were clearly not normal. Another feature often present in nonparametric tests is that they are outlier resistant. In other words, if an outlier is appended to a dataset, measures such as the arithmetic mean will be perhaps significantly altered. Often, by their nature, nonparametric tests are somewhat more immune to such changes.

As a final comment before we delve into our usual examples, if an assumption of a test cannot be satisfied, then we cannot trust the conclusions of that test (see Chapter 11, section 11.2)—that is, a t-test run on non-normal data might give a significant p value when there is no significance (beyond the chance of Type I error) or might give a nonsignificant p value when there is in fact significance (beyond the chance of Type II error). Using a nonparametric test can then provide guidance in such cases. Generally, however, the more assumptions that can be found to hold true, the more nuanced the test results can be. Thus, the traditional methods, when they hold true, in some sense work “better” or are more powerful than (some of) the modern methods.

16.2 Nonparametric Tests

So far, we have already looked at the function wilcox.test(), which was also called the Mann-Whitney test (see the section “A Modern Alternative to the Traditional t-Test,” in Chapter 11). That test was used similarly to two independent sample t-tests, except it removed the assumption that normality was required. In this section, we make extensive use of the coin package. We now explore the Wilcoxon-Signed-Rank test (a substitute for a t-test in paired samples), Spearman’s test (a ranked version of Pearson’s correlation coefficient), the Kruskal-Wallis test, and a test to measure up to k independent sample data.

We start off by installing the coin package, and loading our libraries.

> install.packages("coin")
package 'coin' successfully unpacked and MD5 sums checked

> library(ggplot2)
> library(GGally)
> library(grid)
> library(reshape2)
> library(scales)
> library(coin)
Loading required package: survival

16.2.1 Wilcoxon-Signed-Rank Test

As with a paired t-test, it is supposed the data are paired and come from the same population and that each pair is randomly selected. This is a non-parametric test, however. Thus, normality is not required. Furthermore, our data need only be ordinal, not truly continuous. This greatly widens the range of data that may be considered. Suppose we have ten pairs of data measured before and after some treatment. By using the R function call runif(), we can generate random data. By using the set.seed() function call, we can be sure that our pseudo-random data will match your data (and thus we expect to get the same answers). As you can see in Figure 16-1, the data are not normally distributed. Thus, using a paired t-test would not be appropriate. The wilcoxsign_test() may be called, however, as these data definitely fit its criterion. All the same, the null hypothesis is not rejected (expected given the use of random data).

> set.seed(4)
> untreated <- runif(10, 20, 75)
> treated <- runif(10,20,75)
> differences = treated - untreated
> xydat <- data.frame(treated, untreated)
> shapiro.test(differences)

        Shapiro-Wilk normality test

data:  differences
W = 0.9352, p-value = 0.5009

> hist(differences)
> wilcoxsign_test(treated ~ untreated,data = xydat)

        Asymptotic Wilcoxon-Signed-Rank Test

data:  y by x (neg, pos)
         stratified by block
Z = 0.96833, p-value = 0.3329
alternative hypothesis: true mu is not equal to 0

9781484203743_Fig16-01.jpg

Figure 16-1. Histogram of differences between two sets of random data showing the data are not normal

16.2.2 Spearman’s Rho

The nonparametric form of Pearson’s r is Spearman’s Rho. You’ll recall that Pearson’s r, also called the Pearson product-moment correlation coefficient, was a measure of linear correlation between two variables. Pearson’s r also requires interval or ratio level metrics, and it is sensitive to outliers. Spearman’s Rho simply seeks to determine if as one variable increases, it is reasonable to say that the other variable either consistently increases or consistently decreases. Spearman’s does not require normality, and can work on ranked or ordinal data. Suppose we want to measure wolf pack hierarchy vs. number of elk successfully hunted in a summer. We can readily send our field researchers out into various national parks to collect data. And, imagining a fully funded operation, suppose the actual number of kill-bites delivered (if such a metric exists) per wolf was collectible. Our response data of elk hunted might well be a reasonable type of data. However, pack hierarchy is less precise. While our field researchers may well have a strong sense as to which animals are dominant compared to which members of the pack are not, there really isn’t a reason to say that the alpha is ten times more dominant than the beta wolf! We see the results of the test for our invented data, along with a scatterplot in Figure 16-2.

> wolfpackH <- c(1:10)
> wolfkills <- c(23, 20, 19, 19, 19, 15, 13, 8, 2, 2)
> spearman_test(wolfkills~wolfpackH)

        Asymptotic Spearman Correlation Test

data:  wolfkills by wolfpackH
Z = -2.9542, p-value = 0.003135
alternative hypothesis: true mu is not equal to 0

> plot(wolfpackH, wolfkills, type="p")

9781484203743_Fig16-02.jpg

Figure 16-2. Wolf scatterplot of elk kills vs. pack hierarchy rank

16.2.3 Kruskal-Wallis Test

The Kruskal-Wallis one-way analysis of variance is another ranked test. It is a nonparametric counterpart to ANOVA. To explore this variety of nonparametric statistics, we’ll be examining the mtcars dataset. The volume of an engine’s cylinders displaces a certain amount of air, called displacement. To get started, let’s look at the distribution of displacement (disp) in Figure 16-3. In looking at the histogram, our data do not appear to be particularly normal.

> hist(mtcars$disp)

9781484203743_Fig16-03.jpg

Figure 16-3. mtcars’ displacement vs. frequency histogram

Next, ANOVA is used to examine whether carburetor predicts displacement; in this case, the results were not statistically significant. However, displacement does not seem to have exactly a normal distribution. In particular, we note that the Normal Q-Q plot has some issues, as seen in Figure 16-4 on the lower left. We could also use a Kruskal-Wallis Rank Sum Test (Hollander & Wolfe, 1973) using the function call kruskal.test(). The Kruskal-Wallis test operates on ranks, rather than on means, so it does not matter what the distribution of the original variable was as everything is converted to ranks.

> summary(aov(disp ~ factor(carb), data = mtcars))
             Df Sum Sq Mean Sq F value Pr(>F)
factor(carb)  5 149586   29917   2.382 0.0662 .
Residuals    26 326599   12562
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> Data4<-aov(disp ~ factor(carb), data = mtcars)
> plot(Data4)

9781484203743_Fig16-04.jpg

Figure 16-4. Normal Q-Q plot of the residuals that do not look fully normal

> kruskal.test(disp ~ factor(carb), data = mtcars)

        Kruskal-Wallis rank sum test

data:  disp by factor(carb)
Kruskal-Wallis chi-squared = 11.868, df = 5, p-value = 0.03664

The kruskal.test() form of this function has few options. More options are available in the coin package version. We utilize the function call kruskal_test() which gives us the same results as the call from base R. The authors recommend performing such double-checks on occasion with packages you are unfamiliar with as a way of checking that you understand how a function works and the quality of the package. It is, of course, not a completely foolproof error-checking methodology, but it can certainly help us understand if there are any differences in notation.

> kruskal_test(disp ~ factor(carb), data = mtcars)

        Asymptotic Kruskal-Wallis Test

data:  disp by factor(carb) (1, 2, 3, 4, 6, 8)
chi-squared = 11.868, df = 5, p-value = 0.03664

16.2.4 One-Way Test

Carrying on from our Kruskal-Wallis data, we explore the perspective of the oneway_test(). This test is a k-sample permutation test that, unlike Kruskal-Wallis, computes the test statistics on untransformed response variables. As our data of displacements are not already ranked, this may yield alternative insight. The results of the permutation test are closer to the initial ANOVA and suggest that overall carburetor does not have a statistically significant effect, which if your authors knew more about cars might not be surprising.

> oneway_test(disp ~ factor(carb), data = mtcars)

        Asymptotic K-Sample Fisher-Pitman Permutation Test

data:  disp by factor(carb) (1, 2, 3, 4, 6, 8)
chi-squared = 9.7381, df = 5, p-value = 0.083

One way we can look at the data is by plotting the means and medians. Some of the levels only have one point because the mean and median overlap when there is a single observation. This is not a perfect example since this is a small dataset with only 32 observations. Often, in data analysis, we may consider dropping some levels or collapsing levels. Nevertheless, we close out this section with code to observe this and the plot in Figure 16-5.

> p <- ggplot(mtcars, aes(carb, disp)) +
+   stat_summary(fun.y = mean, geom = "point", colour = "black", size = 3) +
+   stat_summary(fun.y = median, geom = "point", colour = "blue", size = 3) +
+   theme_bw()
> p

9781484203743_Fig16-05.jpg

Figure 16-5. Means and medians of carburators vs. displacement

16.3 Bootstrapping

One of the primary forces behind many aspects of statistics is the realization that it is not possible to deal with entire populations of data. So, instead, a fair amount of effort is spent projecting sample data as estimators of the population measures. However, as we have seen, not all populations are normal or readily matchable to some other useful mold. Thus, the nonparametric tests can allow for the same process to occur, but without as many preconditions or assumptions.

However, what if one could turn a sample into a population? What if there was some way to randomly draw again and again from a sample, until one’s data size was as large as a population? Such a sample would not be as precise as drawing randomly from the actual population. Still, we could presumably use more direct methods rather than the less precise statistical estimates. Methods that attempt to exploit such ideas are generally termed resampling methods. The variant we discuss in this section is the bootstrap.

The name bootstrap comes from a story about a baron who was trapped in a swamp. Left with few resources, he ended up pulling himself out of the swamp by his own bootstraps (an impressive feat to be sure) and was able to be on his way. Slightly less mythically, we perform our sampling from our sample with replacement, and apply such a process to many different scenarios. We do not exhaustively treat those scenarios in this section; we simply observe several examples that are perhaps very common.

In our R code for this section, we will use the boot package library(boot). The boot package has a function, boot(), which is what we will primarily use for bootstrapping. This function takes as input both data and another function that produces the expected values to be bootstrapped. The function inputted into the boot function call should take a parameter and the indices of the data which boot() will provide for each bootstrapped (re)sample.

> library(boot)

Attaching package: 'boot'

The following object is masked from 'package:survival':

    aml

>

16.3.1 Examples from mtcars

To start, we can just bootstrap the difference in two means like an independent samples t-test. Because the indices are randomly generated, to make the bootstrap reproducible we need to set the random number seed. We return to our mtcars dataset, and again inspect our displacement data. This time, we create a dataset we call bootres where we’ve drawn 5,000 bootstrap samples to get the distribution of mean differences. We also plot the distribution and observe the quantiles in Figure 16-6.

> set.seed(1234)
> ## now we can draw 5,000 bootstrap samples
> ## to get the distribution of mean differences
>
> bootres <- boot(
+   data = mtcars,
+   statistic = function(d, i) {
+     as.vector(diff(tapply(d$disp[i], d$vs[i], mean)))
+   },
+   R = 5000)
> plot(bootres)

9781484203743_Fig16-06.jpg

Figure 16-6. Histogram of 5,000 bootstrapped samples and Normal Q-Q plot

We can also plot the distribution with an added line for the mean in the raw data. In this case the distribution of mean differences appears approximately normal as we see in Figure 16-7 plotted from the following code.

> hist(bootres$t[,1])
> abline(v = bootres$t0, col = "blue", lwd = 3)

9781484203743_Fig16-07.jpg

Figure 16-7. Histogram of bootres with an added line for the mean in the raw data

The real power of bootstrapping is that bootstrapping works even for items where the distribution is unknown, such as for medians. The distribution of medians appears less normal as seen in Figure 16-8.

> set.seed(1234)
> bootres2 <- boot(
+   data = mtcars,
+   statistic = function(d, i) {
+     as.vector(diff(tapply(d$disp[i], d$vs[i], median)))
+   },
+   R = 5000)
>
> hist(bootres2$t[,1], breaks = 50)
> abline(v = bootres2$t0, col = "blue", lwd = 3)

9781484203743_Fig16-08.jpg

Figure 16-8. Histogram of bootres2 for medians on displacement from mtcars

Another simple example is bootstrapping the log of the variance as shown in the following code and in Figure 16-9. We take the log of the variance because variances fall within the bounds of (0, Inf). When log transformed the bounds go from (-Inf, Inf). It is generally better, if possible, to have unbounded variances.

> set.seed(1234)
> bootres3 <- boot(
+   data = mtcars,
+   statistic = function(d, i) {
+     as.vector(diff(tapply(d$disp[i], d$vs[i], function(x) log(var(x)))))
+   },
+   R = 5000)
>
> hist(bootres3$t[,1], breaks = 50)
> abline(v = bootres3$t0, col = "blue", lwd = 3)

9781484203743_Fig16-09.jpg

Figure 16-9. Histogram for log of the variance on displacement from mtcars

16.3.2 Bootstrapping Confidence Intervals

While looking at the various distributions from the bootstrap samples is helpful, typically we want to summarize the distribution somehow. Most commonly, people report on the estimate and bootstrapped confidence intervals. There are actually several different ways to calculate confidence intervals from bootstrapping.

Typically if we want 95% confidence intervals, we have to work out what the 2.5th and 97.5th percentile of a distribution would be, given the estimated parameters. For example, for a normal distribution we use the estimated mean and variance, and from those we can find the percentiles. In bootstrapping, we can use the “percentile” method to get confidence intervals very easily. Because we actually have (typically) thousands of bootstrap resamples, we actually have a pretty good sample of the distribution we are interested in. Rather than working out assumptions and the math for a particular percentile, we can just find the percentiles we desire (typically 2.5th and 97.5th) empirically. To do this in R, we can use the quantile() function, which calculates quantiles or percentiles. For example (rounded to three decimals):

> round(quantile(bootres3$t[,1], probs = c(.025, .975)), 3)
  2.5%  97.5%
-2.586 -0.313

We can also do this directly in R using the boot.ci() function with type = "perc" as shown next:

> boot.ci(bootres3, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL :
boot.ci(boot.out = bootres3, type = "perc")

Intervals :
Level     Percentile
95%   (-2.588, -0.312 )
Calculations and Intervals on Original Scale

The reason these estimates are slightly different is that for non-integer order statistics, R interpolates on the normal quantile scale. That is, the .025th percentile may not exactly exist in the bootstrap data, and so R interpolates between the two nearest points to estimate it, and the way that is done in boot.ci() vs. quantile() differs, leading to slightly different results. These would get smaller and smaller as the number of bootstrap resamples increased (if you like as an exercise, try changing from 5,000 to 50,000, which should still only take a few minutes for this simple example, and then rerun and compare the results). Putting these together with our graph, we can visualize it as shown in Figure 16-10.

> hist(bootres3$t[,1], breaks = 50)
> abline(v = bootres3$t0, col = "blue", lwd = 3)
> abline(v = quantile(bootres3$t[,1], probs = c(.025)), col = "yellow", lwd = 3)
> abline(v = quantile(bootres3$t[,1], probs = c(.975)), col = "yellow", lwd = 3)

9781484203743_Fig16-10.jpg

Figure 16-10. Histogram with confidence intervals for log of the variance

Another type of confidence interval is called the “basic” bootstrap. It is actually very similar to the percentile method, but rather than reporting the percentiles directly, these are subtracted from 2x the actual estimate (again rounded by three decimals). Again, we may do this directly in R using the boot.ci() function with type = "basic". We show both sets of code next, and, again, small differences are due to interpolation.

> round((2 * bootres3$t0) - quantile(bootres3$t[,1], probs = c(.975, .025)), 3)
 97.5%   2.5%
-2.205  0.068

## directly
> boot.ci(bootres3, type = "basic")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL :
boot.ci(boot.out = bootres3, type = "basic")

Intervals :
Level      Basic
95%   (-2.206,  0.070 )
Calculations and Intervals on Original Scale

Another type of confidence interval is called the “normal” interval, which assumes the parameter distribution is normal but, instead of using the variance estimate from the model, uses the variance of the bootstrap distribution. The normal interval also adjusts for bias in the bootstrap distribution, which is calculated as the difference between the mean of the bootstrap samples and the actual statistic on the original raw data. Again, we can also do this directly in R using the boot.ci() function with type = "norm" as shown in the following code:

> bias <- mean(bootres3$t) - bootres3$t0
> sigma <- sd(bootres3$t[,1])
> ## manually
> round(bootres3$t0 - bias - qnorm(c(.975, .025), sd = sigma), 3)
[1] -2.296 -0.106
> ## directly
> boot.ci(bootres3, type = "norm")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL :
boot.ci(boot.out = bootres3, type = "norm")

Intervals :
Level      Normal
95%   (-2.296, -0.106 )
Calculations and Intervals on Original Scale

A final approach to confidence intervals in bootstrapping is the bias corrected and accelerated (BCa) confidence interval. This attempts to adjust not only for bias but also for differences in the shape of the bootstrap distribution. The details behind the exact calculation are more complicated and are not discussed here (but see Carpenter & Bithell, 2000). It is easy to obtain the BCa bootstrap confidence intervals in R, and at least in theory these should provide less bias and better coverage (i.e., a nominal 95% confidence interval should include the true value about 95% of the time; if it includes the true value exactly 95% of the time, this is a good sign, if a nominally 95% confidence interval only includes the true value about 80% of the time, we would conclude that the coverage was poor).

> boot.ci(bootres3, type = "bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL :
boot.ci(boot.out = bootres3, type = "bca")

Intervals :
Level       BCa
95%   (-2.384, -0.194 )
Calculations and Intervals on Original Scale

So far, we have examined fairly simple examples of statistics we might want to bootstrap. Next, we will examine a more complex case where we could use bootstrapping.

16.3.3 Examples from GSS

In the previous chapter, we built some models for categorical outcomes using the GSS data. We will use that data again and fit an ordered logistic model to it as a preliminary step, and then examine how we might incorporate bootstrapping. We show the code to get the basic model up running below with some cleanup for legibility:

> library(foreign)
> library(VGAM)

> gss2012 <- read.spss("GSS2012merged_R5.sav", to.data.frame = TRUE)
> gssr <- gss2012[, c("age", "sex", "marital", "educ", "income06", "satfin", "happy", "health")]
> gssr <- na.omit(gssr)

> gssr <- within(gssr, {
+   age <- as.numeric(age)
+   Agec <- (gssr$age - 18) / 10
+   educ <- as.numeric(educ)
+   # recode income categories to numeric
+   cincome <- as.numeric(income06)
+   satfin <- factor(satfin,
+                    levels = c("NOT AT ALL SAT", "MORE OR LESS", "SATISFIED"),
+                    ordered = TRUE)
+ })

> m <- vglm(satfin ~ Agec + cincome * educ,
+           family = cumulative(link = "logit", parallel = TRUE, reverse = TRUE), data = gssr)
> summary(m)

Call:
vglm(formula = satfin ~ Agec + cincome * educ, family = cumulative(link = "logit",
    parallel = TRUE, reverse = TRUE), data = gssr)

Pearson residuals:
                  Min      1Q  Median     3Q   Max
logit(P[Y>=2]) -3.651 -0.9955  0.3191 0.7164 2.323
logit(P[Y>=3]) -1.562 -0.6679 -0.3018 0.7138 6.240

Coefficients:
               Estimate Std. Error z value Pr(>|z|)
(Intercept):1  0.964729   0.480847   2.006   0.0448 *
(Intercept):2 -1.188741   0.481270  -2.470   0.0135 *
Agec           0.169257   0.021790   7.768 8.00e-15 ***
cincome       -0.061124   0.027614  -2.213   0.0269 *
educ          -0.182900   0.036930  -4.953 7.32e-07 ***
cincome:educ   0.012398   0.002033   6.098 1.07e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of linear predictors:  2
Names of linear predictors: logit(P[Y>=2]), logit(P[Y>=3])
Dispersion Parameter for cumulative family:   1
Residual deviance: 5675.892 on 5664 degrees of freedom
Log-likelihood: -2837.946 on 5664 degrees of freedom
Number of iterations: 5

Now that we have the basic model, what if we wanted to get bootstrapped confidence intervals for coefficients or predictions from the model? To do this within R, we need to write a function that can be passed to the boot() function that will get all the statistics we are interested in exploring.

> model_coef_predictions <- function(d, i) {
+
+   m.tmp <- vglm(satfin ~ Agec + cincome * educ,
+                 family = cumulative(link = "logit", parallel = TRUE, reverse = TRUE),
+                 data = d[i, ])
+   newdat <- expand.grid(
+     Agec = seq(from = 0, to = (89 - 18)/10, length.out = 50),
+     cincome = mean(d$cincom),
+     educ = c(12, 16, 20))
+
+   bs <- coef(m.tmp)
+   predicted.probs <- predict(m.tmp, newdata = newdat,
+                              type = "response")
+
+   out <- c(bs, predicted.probs[, 1], predicted.probs[, 2], predicted.probs[, 3])
+
+   return(out)
+ }

Now, this next bit of code may take some time to run on your machine. In fact, in one of the remaining chapters we will discuss ways to wring out more performance from modern computers. At its default settings, R is not the most efficient user of available compute power. To put this into context, we ran this on fairly powerful machines with large amounts of RAM (not so relevant in this case) and multiple cores (potentially relevant but not with this code) and high processing power (more relevant). It took under ten minutes, but over three to run (the author made a brief caffeinated beverage run and offers apologies for less than precise measurement).

> set.seed(1234)
> boot.res <- boot(
+   data = gssr,
+   statistic = model_coef_predictions,
+   R = 5000)

Again, this code took several minutes to run, but once complete, it gives us quite a bit of information as it has the bootstrapped distributions for each of the model coefficients as well as a variety of predicted probabilities. Of course, we could have just saved the model coefficients, as those are enough to calculate the predicted probabilities.

To start with, we can loop through the results and calculate the 95% BCa confidence intervals for each statistic) and store the results along with the estimate in the original data in a new data frame. Perhaps surprisingly, calculating the confidence intervals for each statistic actually took longer than running the initial bootstrap! To see how this works without taking too much time, you can run just the first few parameters, by replacing 1:length(boot.res$t0) with 1:6 in the code that follows. On a high-end desktop, this code took over an hour, and may take much longer depending on the specific machine being used. This is due to using the BCa confidence intervals, which are more computationally demanding, and are repeated for each of the statistics of which there are 450 predicted probabilities (three levels of the outcome by 150 observations in our made-up dataset for prediction), and a handful more model coefficients.

> boot.res2 <- lapply(1:length(boot.res$t0), function(i) {
+   cis <- boot.ci(boot.res, index = i, type = "bca")
+   data.frame(Estimate = boot.res$t0[i],
+              LL = cis$bca[1, 4],
+              UL = cis$bca[1, 5])
+ })

Next we can take the results, which currently are a list of data frames where each data frame has a single row, and combine them row-wise.

> boot.res2 <- do.call(rbind, boot.res2)
> head(round(boot.res2, 3), 10)
              Estimate     LL     UL
(Intercept):1    0.965  0.066  1.888
(Intercept):2   -1.189 -2.097 -0.263
Agec             0.169  0.124  0.213
cincome         -0.061 -0.113 -0.007
educ            -0.183 -0.254 -0.112
cincome:educ     0.012  0.009  0.016
1                0.434  0.395  0.474
2                0.428  0.391  0.467
3                0.422  0.386  0.459
4                0.417  0.381  0.452

We can see that the first six rows give parameter estimates along with confidence intervals, and then the predicted probabilities start. We can use the rep() function to help us label each row of the dataset.

> boot.res2$Type <- rep(c("coef", "Not Satisfied", "More/Less Satisfied", "Satisified"),
+                       c(6, 150, 150, 150))

Copying the data used from prediction within our function, we can merge the predicted results and confidence intervals from bootstrapping with the values used for prediction to generate a final dataset for graphing or presentation. Note that since we have three levels of the outcome, we need to repeat our newdat object three times. We could have just typed it three times, but that does not scale well (what if you needed to do it 300 times), so we show how to use the rep() function again this time to create a list where each element is a data frame, we then combine row-wise using do.call() and rbind(), before finally column-wise combining it with the bootstrapping results to create a final dataset for presentation.

> newdat <- expand.grid(
+   Agec = seq(from = 0, to = (89 - 18)/10, length.out = 50),
+   cincome = mean(gssr$cincom),
+   educ = c(12, 16, 20))
>
> finaldat <- cbind(boot.res2[-(1:6), ], do.call(rbind, rep(list(newdat), 3)))

We do this, and graph the results in Figure 16-11 using fairly familiar code as follows:

> p<- ggplot(finaldat, aes(Agec, Estimate, colour = Type, linetype = Type)) +
+   geom_ribbon(aes(ymin = LL, ymax = UL, colour = NULL, fill = Type), alpha = .25) +
+   geom_line(size = 1.5) +
+   scale_x_continuous("Age", breaks = (c(20, 40, 60, 80) - 18)/10,
+                      labels = c(20, 40, 60, 80)) +
+   scale_y_continuous("Probability", labels = percent) +
+   theme_bw() +
+   theme(legend.key.width = unit(1.5, "cm"),
+         legend.position = "bottom",
+         legend.title = element_blank()) +
+   facet_wrap(~educ) +
+   ggtitle("Financial Satisfaction")
> p

9781484203743_Fig16-11.jpg

Figure 16-11. Financial satisfaction plot(s) sorted by years of education

These results show us that for adults in the average income bin, there is not much difference in financial satisfaction by education level. Conversely, there are large changes across the lifespan.

Finally, let’s examine whether there are any differences in the confidence intervals from normal theory vs. bootstrapping. First, we get the estimates and standard errors as a matrix.

> (coef.tab <- coef(summary(m)))
                 Estimate  Std. Error   z value     Pr(>|z|)
(Intercept):1  0.96472914 0.480847174  2.006311 4.482304e-02
(Intercept):2 -1.18874068 0.481270087 -2.470007 1.351103e-02
Agec           0.16925731 0.021790024  7.767651 7.995485e-15
cincome       -0.06112373 0.027614421 -2.213471 2.686517e-02
educ          -0.18290044 0.036929590 -4.952680 7.319831e-07
cincome:educ   0.01239790 0.002033034  6.098224 1.072537e-09

Then we can calculate the 95% confidence intervals and combine with our bootstrapped results (dropping the fourth column labeling the type of the result).

> coef.res <- cbind(boot.res2[1:6, -4],
+                   NormalLL = coef.tab[, 1] + qnorm(.025) * coef.tab[, 2],
+                   NormalUL = coef.tab[, 1] + qnorm(.975) * coef.tab[, 2])
> coef.res
                 Estimate           LL           UL     NormalLL     NormalUL
(Intercept):1  0.96472914  0.065709812  1.888282581  0.022285994  1.907172279
(Intercept):2 -1.18874068 -2.096506820 -0.262608836 -2.132012714 -0.245468639
Agec           0.16925731  0.124358398  0.212613375  0.126549646  0.211964972
cincome       -0.06112373 -0.112977921 -0.007421777 -0.115246998 -0.007000457
educ          -0.18290044 -0.254102950 -0.111955065 -0.255281108 -0.110519774
cincome:educ   0.01239790  0.008504279  0.016193052  0.008413223  0.016382571

While there are some differences, in this particular case, they appear quite small. These results might encourage us to believe that these data are “well-behaved” and the parametric assumptions we made are fairly reasonable. In other cases, results may differ more and we would have to try to understand why and decide which results we trusted.

So far we have seen how bootstrapping can be applied in many situations; in some of those situations confidence intervals are not well-defined (like the median) and in other cases we may use bootstrapping as a sort of sensitivity analysis or a check that our results are robust. Before we conclude our discussion of bootstrapping, we will look at one more example that shows how flexible bootstrapping can be.

Suppose we wanted test whether the difference in the probability of being financially satisfied between an 18-year-old and an 89-year-old, who both had 16 years of education and an average income bin, was exactly canceling each other out (i.e., the same magnitude but opposite sign differences):

Eqn16-1.jpg

> subset(finaldat, Agec %in% c(0, 7.1) & educ == 16 & Type != "More/Less Satisfied")
      Estimate        LL        UL          Type Agec  cincome educ
51   0.4068473 0.3678249 0.4481580 Not Satisfied  0.0 17.04056   16
100  0.1709739 0.1440701 0.2042896 Not Satisfied  7.1 17.04056   16
513  0.1447413 0.1241949 0.1686146    Satisified  0.0 17.04056   16
1002 0.3601464 0.3127528 0.4066795    Satisified  7.1 17.04056   16

From here, we can see that we want the 51st and 100th for the first and third outcome level. We can make an index variable to grab these and check that it works.

> index <- c(51, 100, 51 + 300, 100 + 300)
> finaldat[index, ]
      Estimate        LL        UL          Type Agec  cincome educ
51   0.4068473 0.3678249 0.4481580 Not Satisfied  0.0 17.04056   16
100  0.1709739 0.1440701 0.2042896 Not Satisfied  7.1 17.04056   16
513  0.1447413 0.1241949 0.1686146    Satisified  0.0 17.04056   16
1002 0.3601464 0.3127528 0.4066795    Satisified  7.1 17.04056   16

Now we can go to the bootstrapping results, noting that since the first six bootstrap statistics are for coefficients, we need to add 6 to our index variable, recalling that t0 has the actual estimates in the real data and t has the bootstrapped distribution.

> tmp.bootres <- boot.res$t0[index + 6]
> btmp.bootres <- boot.res$t[, index + 6]

Now we can calculate the differences and test them. We show the resulting histogram in Figure 16-12.

> deltaSatisfied <- tmp.bootres[4] - tmp.bootres[3]
> deltaNotSatisfied <- tmp.bootres[2] - tmp.bootres[1]

> bdeltaSatisfied <- btmp.bootres[, 4] - btmp.bootres[, 3]
> bdeltaNotSatisfied <- btmp.bootres[, 2] - btmp.bootres[, 1]

> test <- deltaSatisfied + deltaNotSatisfied
> btest <- bdeltaSatisfied + bdeltaNotSatisfied

> hist(btest, breaks = 50)
> abline(v = test, col = "blue", lwd = 5)
> abline(v = quantile(btest, probs = .025), col = "yellow", lwd = 5)
> abline(v = quantile(btest, probs = .975), col = "yellow", lwd = 5)

9781484203743_Fig16-12.jpg

Figure 16-12. Histogram of btest

The 95% confidence interval just includes zero, suggesting that we cannot reject the hypothesis that

Δ Satisfied + Δ Not Satisfied = 0

and practically speaking, suggesting that indeed the difference between young and old in probability of being financially satisfied is about the same magnitude as the difference in probability of being not financially satisfied. This same approach would work, even if we had a more complex hypothesis and various nonlinear transformations. While those are challenges for deriving standard errors, they are comparatively straightforward in regard to bootstrapping.

16.4 Final Thought

For centuries, statistics has been concerned with taking sample data and using them along with clever models to determine, within some level of confidence, how the population data are behaving. Generally, this has required making various assumptions about how the population data might be expected to behave. Through nonparametric models, we are able to relax or even remove some assumptions or preconditions on our data, while still having actionable models. With bootstrapping, we are even able to cope with data that may well have very few known characteristics. However, these come with a price. A nonparametric model may well not be as “tight” a confidence interval as a similar parametric model. This, of course, may be solved with collecting more data, yet even that comes with a computation load cost. And, as we saw for bootstrapping, that computational load may well be comparatively high. Nevertheless, computers are very helpful, and to explore and understand data without them would be perhaps foolish. We turn now to a look at data visualization with the rich graphics of the medium, before we spend some time discovering just how to exploit as much computational efficiency in a machine as possible.

References

Canty, A., & Ripley, B. boot: Bootstrap R (S-Plus) Functions. R package version 1.3-17, 2015.

Carpenter, J., & Bithell, J. “Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians.” Statistics in Medicine, 19(9), 1141–1164 (2000).

Hollander, M., & Wolfe, D. A. Nonparametric Statistical Methods. New York: John Wiley & Sons, 1973.

Hothorn, T., Hornik, K., van de Wiel, M. A., & Zeileis, A. (A Lego system for conditional inference. The American Statistician, 60(3), 257–263 (2006).

R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2015. www.R-project.org/.

Schloerke, B., Crowley, J., Cook, D., Hofmann, H., Wickham, H., Briatte, F., Marbach, M., & and Thoen, E. GGally: Extension to ggplot2. R package version 0.5.0, 2014. http://CRAN.R-project.org/package=GGally.

Wickham, H. “Reshaping data with the reshape package. Journal of Statistical Software, 21(12), 1-20 (2007). www.jstatsoft.org/v21/i12/.

Wickham., H. ggplot2: Elegant Graphics for Data Analysis. New York: Springer, 2009.

Wickham, H. scales: Scale Functions for Visualization. R package version 0.2.5, 2015. http://CRAN.R-project.org/package=scales.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.219.130