CHAPTER 11

image

Modern Statistical Methods

Statistics benefited greatly from the introduction of the modern digital computer in the middle of the 20th century. Simulations and other analyses that once required laborious and error-prone hand calculations could be programmed into the computer, saving time and increasing accuracy. We have already used simulations for some demonstrations. In this chapter, we will discuss modern robust alternatives to the standard statistical techniques we discussed in Chapter 10.

As you will recall from our previous discussions, some estimators of population parameters, such as the median, are relatively robust, while others, such as the mean, are less robust, because they are influenced by outliers or the shape of the distribution. Modern robust statistics include an amalgam of procedures that are less sensitive to violations of the standard statistical assumptions than are the traditional techniques we discussed in Chapter 10. Modern techniques include the processes of trimming or Winsorizing estimates such as those of the mean or the variance, bootstrapping, permutation tests, and a variety of rank-based or nonparametric tests (Erceg-Hurn & Mirosevich, 2008). The books and articles by Professor Rand Wilcox of USC are also quite helpful in their presentation of many modern robust alternatives to traditional statistical procedures (for an introduction, see Wilcox, 2010).

We will separate Chapter 11 into the following sections: a discussion of the need for modern statistical methods and some robust alternatives to the traditional t-tests, and then an introduction to both bootstrapping and permutation tests. We will not be looking for a robust alternative to every specific hypothesis test we have discussed thus far but simply for a representation of this more modern approach in contrast to the traditional one.

11.1 The Need for Modern Statistical Methods

The development of traditional statistical procedures and hypothesis tests began at the end of the 19th century and continued into the first half of the 20th century. Even then it was clear that many real datasets did not meet the familiar distributional assumptions of normality, independence, and equality of variance. Many observed data were noted to be skewed in distribution, and nonparametric alternatives were developed, among the first being the Spearman rank correlation as an alternative to the Pearson product-moment correlation we discussed in Chapter 10. Nonparametric tests are so called because they make fewer (if any) assumptions about population parameters than parametric tests, and because in many cases no estimate of a population parameter is being considered.

Many modern statistical procedures are nonparametric in the second sense, as they often do not rely on population parameters but in fact treat a sample as a “pseudo-population” and repeatedly sample with replacement from the original sample in order to generate a distribution of a particular statistic of interest. Resampling techniques can also be used for calculating confidence intervals, not just for familiar statistics but for ones we might create on our own.

The methods of statistics continue to evolve. Data these days are much more ready to become “big” than in any previous era. One of the powerful advantages R provides is the ability to cheaply run many procedures on even comparatively large datasets. The exploratory data analysis promoted by Tukey can now readily go far beyond “just” box-and-whisker plots. Provided the preconditions of particular tests are met, data may be subjected to a battery of procedures to compare and contrast.

11.2 A Modern Alternative to the Traditional t Test

The independent-samples t-test is one of the most popular of all statistical procedures. As we have discussed, the t-test assuming unequal variances is available when the data analyst is not willing to assume homoscedasticity. A nonparametric alternative to the independent-samples t-test is the Mann-Whitney U test, for which the data for both groups are converted to ranks. The question arises as to the performance of these alternatives regarding their statistical power when the assumptions of equality of variance and normality of distribution are violated.

Yuen (1974) developed a robust alternative to the independent-samples t-test. Yuen’s test makes use of trimmed means and Winsorized variances for both groups. When the trimming amount is zero, the Yuen test produces the same confidence interval for the difference between means as the Welch t-test in base R. Although we can trim means by any amount, it is common to use a 20% trimmed mean as a robust estimator of the population mean. This amounts to trimming the top 20% of the data and the bottom 20% of the data and then calculating the mean of the remaining values. It is worth mentioning that the median is by definition the 50% trimmed mean. Winsorizing is a slightly different process from trimming. Whereas trimming discards data values, Winsorization replaces a certain percentage of the top and bottom values by the scores at given quantiles (e.g., the 5th and 95th percentiles). The unequal-variances t-test has been shown to perform reasonably when both samples are drawn from normal populations but less well when the distributions are not normal and when the sample sizes are different. Rand Wilcox’s WRS (Wilcox’ Robust Statistics) package is available on GitHub. The package contains many functions (more than 1,100 in fact) for various robust statistical methods, including the Yuen t-test. See the following URL for instructions on how to install WRS or run the commands in the source code distributed with this text (available on apress.com).

https://github.com/nicebread/WRS

We save an in-depth treatment of the exact mathematical formulae for other authors and texts. We take two groups that differ in sample size, means, and variances. Figure 11-1 shows the kernel density diagrams for the two groups for comparison purposes.

9781484203743_Fig11-01.jpg

Figure 11-1. Density plots for two groups of data

Let us explore these two datasets a bit more, and then compare and contrast the t-test with two new tests that are more robust. Of course, from the pictures in Figure 11-1, one would not expect the variance or the means to be the same.

> group1 <- c(151, 78, 169, 88, 194, 196, 109, 143, 150, 85, 168)
> group2 <- c(128, 122, 95, 97, 81)
> summary(group1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   78.0    98.5   150.0   139.2   168.5   196.0
> var(group1)
[1] 1843.364
>
> summary(group2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   81.0    95.0    97.0   104.6   122.0   128.0
> var ( group2 )
[1] 389.3

For the sake of reference, let us perform both the t-test assuming equal variances and the Welch t-test. Notice that the Welch t-test gives a p value of less than 0.05.

> t.test(group2, group1, var.equal = TRUE)

        Two Sample t-test

data:  group2 and group1
t = -1.6967, df = 14, p-value = 0.1119
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -78.295181   9.131545
sample estimates:
mean of x mean of y
 104.6000  139.1818

> t.test(group2, group1)

        Welch Two Sample t-test

data:  group2 and group1
t = -2.2074, df = 13.932, p-value = 0.04457
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -68.1984170  -0.9652194
sample estimates:
mean of x mean of y
 104.6000  139.1818

Now let us use the Yuen test with a trimming amount of .20 for the means, which is the default. Please see the instructions at the abovementioned GitHub URL (https://github.com/nicebread/WRS) to install WRS or run the downloaded source code.

> library ( WRS )
> yuenTest <- yuen ( group2 , group1)
> yuenTest $p.value
[1] 0.1321475
> yuenTest $ci
[1] -83.33050  13.23527

Inspection of the results shows that the Yuen test based on trimmed means and Winsorized variances has a more conservative confidence interval, and thus it reduces the chance for a Type I error over either of the two standard t-tests for independent groups. As a final consideration in this section, observe the results from the Mann-Whitney U test, which is the test produced by the wilcox.test function in R for independent groups.

> wilcox.test ( group2, group1 )

        Wilcoxon rank sum test

data:  group2 and group1
W = 15, p-value = 0.1804
alternative hypothesis: true location shift is not equal to 0

There are many other modern statistical tests available for a variety of situations, including robust alternatives to one-sample tests, analysis of variance, and regression. It is always important to examine any necessary preconditions or assumptions to using a particular test (and to examine any specific disclaimers about how R programmers have implemented those tests).

11.3 Bootstrapping

Bootstrapping is simple in logic. Instead of assuming anything about a population, we can sample the same dataset repeatedly. Sampling with replacement allows us to build a distribution of any particular statistic of interest. We are essentially using our sample as a “pseudo-population” when we take multiple resamples with replacement from it.

The “plug-in” principle of bootstrapping means that to estimate a parameter, which is some measurable characteristic of a population, we use the statistic that is the corresponding quantity for the sample. This principle allows us to model sampling distributions when we have little or no information about the population, when the sample data do not meet the traditional assumptions required for parametric tests, and when we create new statistics and want to study their distributions.

To illustrate, let us generate a random sample from a normal distribution with a mean of 500 and a standard deviation of 100. We will take 1,000 observations and then resample from our data 1,000 times with replacement. First, let’s bootstrap the mean and then the median to see how this would work. We create the data and find that the mean and standard deviation are close to 500 and 100, respectively. We can use our bootstrapped sample and the quantile function to calculate a confidence interval for the 1,000 means we generated. Note that the population mean of 500 is “in” the confidence interval for the mean of our bootstrapped means. I went back to the base R package and used the abline function to add vertical lines for the two confidence limits (see Figure 11-2).

9781484203743_Fig11-02.jpg

Figure 11-2. Histograms for bootstrapped means with confidence limits added

> myData <- rnorm (1000 , 500 , 100) 
> resamples <- lapply (1:1000 , function (i) sample ( myData , replace = TRUE ))
> r.mean <- sapply ( resamples, mean )
> ci.mean <- c( quantile (r.mean, 0.025) , quantile (r.mean, 0.975) )
> ci.mean
2.5% 97.5%
491.0863 503.1850

> hist (r.mean )
> abline (v = quantile (r.mean , 0.025) )
> abline (v = quantile (r.mean , 0.975) )

> t.test ( myData )

One Sample t- test

data : myData
t = 155.2015 , df = 999 , p- value < 2.2e -16
alternative hypothesis : true mean is not equal to 0
95 percent confidence interval :
490.4900 503.0522
sample estimates :
mean of x
496.7711

See that the confidence interval for the mean of the original data is virtually the same as the confidence interval for the bootstrapped means.

Finding confidence intervals and standard error estimates for medians is less commonly done than finding these for means. Let us continue with our example and bootstrap the median and the mean for the 1,000 samples from a normal population with a mean of 500 and a standard deviation of 100. We will use the same technique as previously, but this time, we will make a function to combine our procedures. We will pass the dataset and the number of resamples as arguments to the function, and then write the results of the function to an object named boot1. This will allow us to query the object for the output of interest. Let us calculate standard errors for both the mean and the median.

> boot.fun <- function (data , num) {
+   resamples <- lapply (1: num , function (i) sample (data, replace=TRUE))
+   r.median <- sapply ( resamples, median )
+   r.mean <- sapply ( resamples, mean )
+   std.err.median <- sqrt ( var (r.median ))
+   std.err.mean <- sqrt (var (r.mean ))
+   rawDataName <-
+   data.frame (std.err.median = std.err.median , std.err.mean = std.err.mean , resamples =  resamples , medians =r.median , means =r.mean )
+ }
> boot1 <- boot.fun ( myData , 1000)> boot1 <- boot . fun ( myData , 1000)
> boot1 $ std.err.mean
[1] 3.191525
> boot1 $ std.err.median
[1] 4.309543

We can see that the medians have a larger standard error than the means. In general, when the data are drawn from a normal distribution with a large sample size, the median will produce a confidence interval about 25% wider than that for the mean.

Figure 11-3 shows the bootstrapped means and medians. The means are clearly the more normally distributed of the two. To produce the histograms, I combined the medians and means into a single data frame and used the ggplot2 and the gridExtra packages to create the side-by-side histograms.

> install.packages("gridExtra")
> library ( gridExtra ) > library ( ggplot2 )
> plot1 <- ggplot (boot1, aes (means)) + geom_histogram (binwidth = 1, fill="white", color="black")
> plot2 <- ggplot (boot1, aes (medians)) + geom_histogram (binwidth=1, fill ="white", color="black")
> grid.arrange (plot1, plot2, nrow = 1)

9781484203743_Fig11-03.jpg

Figure 11-3. Histograms for bootstrapped means and medians

11.4 Permutation Tests

Bootstrapping produces a distribution by sampling with replacement. Because the sampling is random, no two bootstraps will produce exactly the same results unless they start with the same seed value. You may recall that permutations are the numbers of ways a set of objects can be ordered or sequenced. Thus when we are comparing means for groups of size n1 and n2, it is instructive to determine the number of ways we can divide a total of N = n1 + n2 objects into two groups of size n1 and n2. We can then determine from many possible ways of dividing the data into groups of size n1 and n2 the proportion of those samples in which the mean difference is larger in absolute value than the original mean difference. The number of possible permutations increases very quickly as the number of objects increases, and with large samples, it is not necessary to calculate all the permutations. Say we want to compare the means of two different groups, as we have done with the t-test and the Yuen test. To run this as a permutation test, we record the mean difference between the two groups and then combine the data into a single group in order to perform the permutation test.

Permutation tests may be asymptotically valid using the permutational central limit theorem, or they may produce exact p values using Monte Carlo simulation, a network algorithm, or complete enumeration. These features are implemented in the perm package available on CRAN. Let us use a hypothetical dataset of a memory study. The data have the memory recall scores for 20 subjects each from two conditions. Assume in this case the patients were randomly assigned to the conditions, and the memory test was performed after taking the drug or placebo for 30 days.

> memory <- read.table("memory_ch11.txt", sep="	", header = TRUE)
> head ( memory )
  cond recall
1 drug 2.0
2 drug 2.0
3 drug 4.5
4 drug 5.5
5 drug 6.5
6 drug 6.5
> tail ( memory )
      cond  recall
35 placebo      16
36 placebo      17
37 placebo      20
38 placebo      25
39 placebo      29
40 placebo      30

Now, let us perform the Welch t-test and the t-test, assuming equal variances, and compare the results with those of permutation tests using asymptotic approximation and exact results. Note in the following code listing that the permTS function in the perm package compares the two samples, but that the permutation tests use the standard normal distribution instead of the t distribution for calculating the p values.

> install.packages("perm")
> library(perm)
> t.test ( recall ~ cond , data = memory )

        Welch Two Sample t-test

data:  recall by cond
t = -2.2552, df = 28.862, p-value = 0.03188
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -8.591498 -0.418502
sample estimates:
   mean in group drug mean in group placebo
                8.930                13.435

> t.test ( recall ~ cond , data = memory, var.equal = TRUE)

        Two Sample t-test

data:  recall by cond
t = -2.2552, df = 38, p-value = 0.02997
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -8.5490275 -0.4609725
sample estimates:
   mean in group drug mean in group placebo
                8.930                13.435

> permTS ( recall ~ cond , data = memory )

        Permutation Test using Asymptotic Approximation

data:  recall by cond
Z = -2.1456, p-value = 0.03191
alternative hypothesis: true mean cond=drug - mean cond=placebo is not equal to 0
sample estimates:
mean cond=drug - mean cond=placebo
                            -4.505

> permTS ( recall ~ cond , data = memory , exact = TRUE )

        Exact Permutation Test Estimated by Monte Carlo

data:  recall by cond
p-value = 0.03
alternative hypothesis: true mean cond=drug - mean cond=placebo is not equal to 0
sample estimates:
mean cond=drug - mean cond=placebo
                            -4.505

p-value estimated from 999 Monte Carlo replications
99 percent confidence interval on p-value:
 0.01251632 0.05338086

Note that the p values for the all four tests are relatively similar. Interestingly, recent research indicates that the permutation test may not perform well when the data for the groups being compared are not identically distributed. The robust Yuen test we used earlier shows that when the data are trimmed and we use Winsorized variances, the results are not significant. This indicates that the probability of Type I error may have been inflated for all four of our earlier tests.

> recall1 <- memory[ memory[,"cond"]=="drug", "recall"]
> recall2 <- memory[ memory[,"cond"]=="placebo", "recall"]
> yuenTest <- yuen ( recall1, recall2 )
> yuenTest $p.value
[1] 0.05191975
> yuenTest $ci
[1] -7.56797807  0.03464473

References

Chambers, J. M. Software for Data Analysis: Programming in r. New York: Springer, 2008.

Erceg-Hurn, D. M., & Mirosevich, V. M. “Modern robust statistical methods: An easy way to maximize the accuracy and power of your research.” American Psychologist, 63 (7), 591-601 (2008).

Hunt, A., & Thomas, D. The Pragmatic Programmer: From Journeyman to Master. Reading, MA: Addison Wesley, 1999.

Micceri, T. “The unicorn, the normal curve, and other improbable creatures.” Psychological Bulletin, 105, 156-166 (1989).

Ohri, A. R for Cloud Computing: An Approach for Data Scientists. New York: Springer, 2014.

Pace, L. A. Beginning R: An Introduction to Statistical Programming. New York: Apress, 2012.

Roscoe, J. T. Fundamental Research Statistics for the Behavioural Sciences (2nd ed.). New York: Holt, Rinehart and Winston, 1975.

Triola, M. F. Elementary Statistics (11th ed.). Boston, MA: Addison-Wesley, 2010.

Tufte, E. R. The Visual Display of Quantitative Information (2nd ed.). Cheshire, CN: Graphic Press, 2001.

Tukey, J. W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977. University of California, Los Angeles. (2015). Resources to help you learn and use R. Retrieved from www.ats.ucla.edu/stat/r/.

Wilcox, R. R. Fundamentals of Modern Statistical Methods (2nd ed.). New York: Springer, 2010.

Wilkinson, L. The Grammar of Graphics (2nd ed.). New York: Springer, 2005.

Yuen, K. K. “The two-sample trimmed t for unequal population variances.” Biometrika, 61, 165-170 (1974).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.151.107