Chapter 7 introduced the idea of thinking through a business question and building a hypothesis test around that query, and it elucidated on the five steps to run a statistical inquiry. This chapter shows how to implement those five steps using the statistical test referred to as the independent t-test or the student’s t-test. Again, a small dataset is used for demonstration purposes only to show how to use the independent t-test and understand its purpose. You’ll want to use a larger dataset in real-life situations.
Independent T-Test at a Glance
An independent t-test looks at the difference of two independent sample means over its standard error. Independent means that the two samples are not dependent or related, and the samples are from different groups and do not overlap. Again, the data needs to be from a random sample. Moreover, as the sample size increases, the t-distribution starts looking more and more like the normal distribution. An independent t-test is used when there is a dichotomous variable and a continuous variable (refer to Appendix A), and the difference between the independent means of the dichotomous variable or the two groups are determined to be significantly different (if the alternative hypothesis is true) or there is not enough evidence to determine if the two means are different (if the null hypothesis holds). A few concepts to know about the t-distribution are that the mean is zero, just as the mean of the standard normal distribution is zero and the distribution is symmetric about the mean as is the normal distribution. Furthermore, as the size of the sample increases, the standard deviation approaches closer and closer to 1 as the standard normal distribution is 1. This is because as the sample increases, the t-distribution looks more and more like the standard normal curve until it becomes large enough to approach the normal curve distribution.
The t-distribution, like the standard normal, goes from negative infinity to positive infinity and never hits the x-axis. Lastly, the t- distribution appears like the normal curve with a mirror image on both sides, except it is smaller and the sample is less peaked in the center (which is the mean) than the normal distribution, and it has fatter tails than in the standard normal distribution. As the sample size increases, the curve for the t-distribution changes and starts looking more and more like the normal curve.
The independent t-test is used when the small sample sizes of the two groups are 30 or more when it is known that the distribution is from a normally distributed distribution or both. Again, as with the Pearson’s r correlation, there are more rigorous techniques for determining the sample size that are not reviewed in this book. Again, you can “eyeball” a histogram to see if the distribution looks approximately normally distributed. A statistician will have other methods that they can use that are not discussed in this book.
Hypothesis Test
This is an example of a hypothesis test of the difference between two independent means of revenue of stores and online sales channels. It consists of the five steps previously introduced in Chapter 7, all of which are covered in the following section.
Step 1: The Hypothesis, or the Reason for the Business Question
The test shows that the independent means from the two channels are either H0 (not enough evidence to say the two independent group means are different) or H1 (two independent group means are significantly different from each other). Here again a two-tailed test is used because there is the equal sign for the null hypothesis and a not equal sign for the alternative.
H0: μ1=μ2
H1: μ1≠μ2
Step 2: Confidence Level
Just like the Pearson correlation has degrees of freedom and confidence level, so does the independent t-test. For the t-test, the standard alpha of 0.05, which is a 5 percent significance level, is used as the standard on any statistical package. Since there are two groups with four subjects in each group, both groups are added together with two degrees of freedom subtracted (one for each group). The degree of freedom is used because of the use of a sample, and, therefore, constraints are put on sample statistics.
df = n1 + n2 -2 = 4 + 4 -2 = 6
Some computer programs use a more mathematical formula to calculate the degrees of freedom, and the result is a bit different. This method is used when there is not a computer that calculates the exact degrees of freedom. The standard two-tailed test is used in this example.
Step 3: Mathematical Operations and Statistical Formula
The main objective for the independent t-test is to determine the difference between the two independent group means over the standard error, which is the noise in the formula. The first step is to calculate the independent mean for each group. The next step will be to calculate the standard error.
Here, there is the variable called “shop” that shows the difference between the binary mode of shopping: Internet versus store shoppers. The object is to determine from the data given if there is a significant difference between Internet and store shoppers in terms of how much they spend or their revenue.
- 1.
Table 9-1 shows the data.
- 2.
The formula is shown here.
Data for Type of Shop and Total Revenue
Shop | Total revenue | ||
1 | 90 | 1 =X1 | = internet |
1 | 100 | 2 = X2 | = store |
1 | 110 | ||
1 | 95 | ||
2 | 70 | ||
2 | 50 | ||
2 | 45 | ||
2 | 65 |
The X-bar is the mean of each sample, while S2 is the variance for each sample, and S is the standard deviation.
X-bar1 - X-bar2
t = _______________________________________________
Sqrt{[((n1 - 1) S12 + (n2 - 1) S22)/(n1 + n2 -1)](1/n1+n2)}
X-bar1= 90+100+110+95
------------------- = 98.75
4
X-bar2= 55 + 70 + 65 + 45
------------------- = 58.75
4
S is the standard deviation:
S = sqrt[(x – x-bar)2]/(n–1)]
S2 is the variance:
S2 = (x – x-bar)2]/(n–1)
Independent T-Test Calculated Using Excel Spreadsheet
Internet | X1 | X1bar | X1-X1bar | (X1-X1bar)2 | Store | X2 | X2bar | X2-X2bar | (X2-X2bar)2 | |
---|---|---|---|---|---|---|---|---|---|---|
1 | 90 | 98.75 | -8.75 | 76.5625 | 2 | 55 | 58.75 | -3.75 | 14.0625 | |
1 | 100 | 98.75 | 1.25 | 1.525 | 2 | 70 | 58.75 | 11.25 | 126.5625 | |
1 | 110 | 98.75 | 11.25 | 126.5625 | 2 | 65 | 58.75 | 6.25 | 39.0625 | |
1 | 95 | 98.75 | -3.75 | 14.0625 | 2 | 45 | 58.75 | -13.75 | 189.0625 | |
Sum | 0 | 218.75 | Sum | 0 | 368.75 |
- 3.
Here the variance is calculated for each group:
Group 1 variance: S2 = 218.75/3 = 72.92, S = sqrt(218.75/3) = 8.54
- 4.
When the variances are not considered different, the pooled standard error formula is calculated. However, when the variances are considered significantly different, there is a more complicated formula that needs to be computed. The book does not go over the math when the variances are different, but at times this independent t-test with different variances needs to be used because the variances are statistically different. (Note: When the variances are considered significantly different, the Data Analysis ToolPak can do that computation as well.)
The following are the calculations for the pooled standard error:
Sqrt{[((n1 - 1) S12 + (n2 - 1) S22)/(n1 + n2 -1)](1/n1+n2)}
=sqrt{[3(218.75/3)+ 3(368.75/3)]/(4+4-2)*(1/4 + 1/4)}
=sqrt{ [218.75 + 368.75]/6]*0.05}
=sqrt{587.5/6*(0.50)}
= 6.9970
Independent t-test = 98.75 - 58.75 40
--------------= -------- = 5.72
6.997 6.997
Output of the Independent T-Test (Using ToolPak)
t-Test: Two-Sample Assuming Equal Variances | ||
X | Y | |
Mean | 98.75 | 58.75 |
Variance | 72.91667 | 122.9166667 |
Observations | 4 | 4 |
Pooled Variance | 97.91667 | |
Hypothesized Mean Difference | 0 | |
Df | 6 | |
t Stat | 5.716717 | |
P(T<=t) one-tail | 0.000621 | |
t Critical one-tail | 1.94318 | |
P(T<=t) two-tail | 0.001241 | |
t Critical two-tail | 2.446912 |
The independent t-test result = 5.72.
The level of significance for the t-test = 2.447.
The p-value = 0.0012141.
Step 4: Results
The results are significantly different at the 0.05 alpha level since the p-value is less than the alpha level. Another way to say this is that the two groups are significantly different because the alpha is 0.05 and the p-value is 0.0012, and 0.0012 is smaller than 0.05; therefore, the two groups are significantly different.
Step 5: Descriptive Analysis
This result shows that shoppers spend significantly more on the Internet (X-bar1 = 98.75) than in the stores (X-bar2 = 58.75). A bar graph could have been used here to show the difference in the means. These results are straightforward and expected. Therefore, the company may want to have more and extra merchandise for their online shop than in their retail stores. This may help the company cut costs because a retail store is more expensive to upkeep than an online store.
Summary
In the next chapter, there will be a more involved example that will put all these statistical tests to use and show how to use these tests in a real life-life scenario. The next chapter will show a business situation of a retail email campaign.