Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9. Independent T-Test

Rhoda Okunev¹

(1)

Tamarac, FL, USA

Chapter 7 introduced the idea of thinking through a business question and building a hypothesis test around that query, and it elucidated on the five steps to run a statistical inquiry. This chapter shows how to implement those five steps using the statistical test referred to as the independent t-test or the student’s t-test. Again, a small dataset is used for demonstration purposes only to show how to use the independent t-test and understand its purpose. You’ll want to use a larger dataset in real-life situations.

Independent T-Test at a Glance

An independent t-test looks at the difference of two independent sample means over its standard error. Independent means that the two samples are not dependent or related, and the samples are from different groups and do not overlap. Again, the data needs to be from a random sample. Moreover, as the sample size increases, the t-distribution starts looking more and more like the normal distribution. An independent t-test is used when there is a dichotomous variable and a continuous variable (refer to Appendix A), and the difference between the independent means of the dichotomous variable or the two groups are determined to be significantly different (if the alternative hypothesis is true) or there is not enough evidence to determine if the two means are different (if the null hypothesis holds). A few concepts to know about the t-distribution are that the mean is zero, just as the mean of the standard normal distribution is zero and the distribution is symmetric about the mean as is the normal distribution. Furthermore, as the size of the sample increases, the standard deviation approaches closer and closer to 1 as the standard normal distribution is 1. This is because as the sample increases, the t-distribution looks more and more like the standard normal curve until it becomes large enough to approach the normal curve distribution.

The t-distribution, like the standard normal, goes from negative infinity to positive infinity and never hits the x-axis. Lastly, the t- distribution appears like the normal curve with a mirror image on both sides, except it is smaller and the sample is less peaked in the center (which is the mean) than the normal distribution, and it has fatter tails than in the standard normal distribution. As the sample size increases, the curve for the t-distribution changes and starts looking more and more like the normal curve.

The independent t-test is used when the small sample sizes of the two groups are 30 or more when it is known that the distribution is from a normally distributed distribution or both. Again, as with the Pearson’s r correlation, there are more rigorous techniques for determining the sample size that are not reviewed in this book. Again, you can “eyeball” a histogram to see if the distribution looks approximately normally distributed. A statistician will have other methods that they can use that are not discussed in this book.

Hypothesis Test

This is an example of a hypothesis test of the difference between two independent means of revenue of stores and online sales channels. It consists of the five steps previously introduced in Chapter 7, all of which are covered in the following section.

Step 1: The Hypothesis, or the Reason for the Business Question

The test shows that the independent means from the two channels are either H₀ (not enough evidence to say the two independent group means are different) or H₁ (two independent group means are significantly different from each other). Here again a two-tailed test is used because there is the equal sign for the null hypothesis and a not equal sign for the alternative.

H₀: μ₁=μ₂

H₁: μ₁≠μ₂

Step 2: Confidence Level

Just like the Pearson correlation has degrees of freedom and confidence level, so does the independent t-test. For the t-test, the standard alpha of 0.05, which is a 5 percent significance level, is used as the standard on any statistical package. Since there are two groups with four subjects in each group, both groups are added together with two degrees of freedom subtracted (one for each group). The degree of freedom is used because of the use of a sample, and, therefore, constraints are put on sample statistics.

df = n₁ + n₂ -2 = 4 + 4 -2 = 6

Some computer programs use a more mathematical formula to calculate the degrees of freedom, and the result is a bit different. This method is used when there is not a computer that calculates the exact degrees of freedom. The standard two-tailed test is used in this example.

Step 3: Mathematical Operations and Statistical Formula

The main objective for the independent t-test is to determine the difference between the two independent group means over the standard error, which is the noise in the formula. The first step is to calculate the independent mean for each group. The next step will be to calculate the standard error.

Here, there is the variable called “shop” that shows the difference between the binary mode of shopping: Internet versus store shoppers. The object is to determine from the data given if there is a significant difference between Internet and store shoppers in terms of how much they spend or their revenue.

The first part shows the data. The second part introduces the formula for the independent t-test. Then the means are determined, and the step-by-step procedure to figure out the variance and standard deviation for each group is followed so that the pooled standard error can be calculated. Once that is done, step 4 of the results can be analyzed.

1.
Table 9-1 shows the data.
2.
The formula is shown here.

Table 9-1

Data for Type of Shop and Total Revenue

Shop	Total revenue
1	90	1 =X₁	= internet
1	100	2 = X₂	= store
1	110
1	95
2	70
2	50
2	45
2	65

The X-bar is the mean of each sample, while S² is the variance for each sample, and S is the standard deviation.

X-bar₁ - X-bar₂

t = _______________________________________________

Sqrt{[((n₁ - 1) S₁² + (n₂ - 1) S₂²)/(n₁ + n₂ -1)](1/n₁+n₂)}

X-bar₁= 90+100+110+95

------------------- = 98.75

X-bar₂= 55 + 70 + 65 + 45

------------------- = 58.75

S is the standard deviation:

S = sqrt[(x – x-bar)²]/(n–1)]

S² is the variance:

S² = (x – x-bar)²]/(n–1)

Table 9-2 shows how the standard deviation is calculated for the group store and Internet group.

Table 9-2

Independent T-Test Calculated Using Excel Spreadsheet

Internet	X1	X1bar	X1-X1bar	(X1-X1bar)²	Store	X2	X2bar	X2-X2bar	(X2-X2bar)²
1	90	98.75	-8.75	76.5625	2	55	58.75	-3.75	14.0625
1	100	98.75	1.25	1.525	2	70	58.75	11.25	126.5625
1	110	98.75	11.25	126.5625	2	65	58.75	6.25	39.0625
1	95	98.75	-3.75	14.0625	2	45	58.75	-13.75	189.0625
		Sum	0	218.75			Sum	0	368.75

3.
Here the variance is calculated for each group:

Group 1 variance: S² = 218.75/3 = 72.92, S = sqrt(218.75/3) = 8.54

Group 2 variance: S² = 368.75/3 = 122.92; sqrt(368.75/3) = 11.087

4.
When the variances are not considered different, the pooled standard error formula is calculated. However, when the variances are considered significantly different, there is a more complicated formula that needs to be computed. The book does not go over the math when the variances are different, but at times this independent t-test with different variances needs to be used because the variances are statistically different. (Note: When the variances are considered significantly different, the Data Analysis ToolPak can do that computation as well.)

The following are the calculations for the pooled standard error:

Sqrt{[((n₁ - 1) S₁² + (n₂ - 1) S₂²)/(n₁ + n₂ -1)](1/n₁+n₂)}

=sqrt{[3(218.75/3)+ 3(368.75/3)]/(4+4-2)*(1/4 + 1/4)}

=sqrt{ [218.75 + 368.75]/6]*0.05}

=sqrt{587.5/6*(0.50)}

= 6.9970

Independent t-test = 98.75 - 58.75 40

--------------= -------- = 5.72

6.997 6.997

In Table 9-3 you can see the output and results of when a t-test is conducted with the Data Analysis ToolPak on the Data tab.

Table 9-3

Output of the Independent T-Test (Using ToolPak)

t-Test: Two-Sample Assuming Equal Variances
	X	Y
Mean	98.75	58.75
Variance	72.91667	122.9166667
Observations	4	4
Pooled Variance	97.91667
Hypothesized Mean Difference	0
Df	6
t Stat	5.716717
P(T<=t) one-tail	0.000621
t Critical one-tail	1.94318
P(T<=t) two-tail	0.001241
t Critical two-tail	2.446912

The independent t-test result = 5.72.

The level of significance for the t-test = 2.447.

The p-value = 0.0012141.

Step 4: Results

The results are significantly different at the 0.05 alpha level since the p-value is less than the alpha level. Another way to say this is that the two groups are significantly different because the alpha is 0.05 and the p-value is 0.0012, and 0.0012 is smaller than 0.05; therefore, the two groups are significantly different.

Step 5: Descriptive Analysis

This result shows that shoppers spend significantly more on the Internet (X-bar₁ = 98.75) than in the stores (X-bar₂ = 58.75). A bar graph could have been used here to show the difference in the means. These results are straightforward and expected. Therefore, the company may want to have more and extra merchandise for their online shop than in their retail stores. This may help the company cut costs because a retail store is more expensive to upkeep than an online store.

Summary

In the next chapter, there will be a more involved example that will put all these statistical tests to use and show how to use these tests in a real life-life scenario. The next chapter will show a business situation of a retail email campaign.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9. Independent T-Test

Create new playlist

Sign In

Sign Up