© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
R. OkunevAnalytics for Retailhttps://doi.org/10.1007/978-1-4842-7830-7_9

9. Independent T-Test

Rhoda Okunev1  
(1)
Tamarac, FL, USA
 

Chapter 7 introduced the idea of thinking through a business question and building a hypothesis test around that query, and it elucidated on the five steps to run a statistical inquiry. This chapter shows how to implement those five steps using the statistical test referred to as the independent t-test or the student’s t-test. Again, a small dataset is used for demonstration purposes only to show how to use the independent t-test and understand its purpose. You’ll want to use a larger dataset in real-life situations.

Independent T-Test at a Glance

An independent t-test looks at the difference of two independent sample means over its standard error. Independent means that the two samples are not dependent or related, and the samples are from different groups and do not overlap. Again, the data needs to be from a random sample. Moreover, as the sample size increases, the t-distribution starts looking more and more like the normal distribution. An independent t-test is used when there is a dichotomous variable and a continuous variable (refer to Appendix A), and the difference between the independent means of the dichotomous variable or the two groups are determined to be significantly different (if the alternative hypothesis is true) or there is not enough evidence to determine if the two means are different (if the null hypothesis holds). A few concepts to know about the t-distribution are that the mean is zero, just as the mean of the standard normal distribution is zero and the distribution is symmetric about the mean as is the normal distribution. Furthermore, as the size of the sample increases, the standard deviation approaches closer and closer to 1 as the standard normal distribution is 1. This is because as the sample increases, the t-distribution looks more and more like the standard normal curve until it becomes large enough to approach the normal curve distribution.

The t-distribution, like the standard normal, goes from negative infinity to positive infinity and never hits the x-axis. Lastly, the t- distribution appears like the normal curve with a mirror image on both sides, except it is smaller and the sample is less peaked in the center (which is the mean) than the normal distribution, and it has fatter tails than in the standard normal distribution. As the sample size increases, the curve for the t-distribution changes and starts looking more and more like the normal curve.

The independent t-test is used when the small sample sizes of the two groups are 30 or more when it is known that the distribution is from a normally distributed distribution or both. Again, as with the Pearson’s r correlation, there are more rigorous techniques for determining the sample size that are not reviewed in this book. Again, you can “eyeball” a histogram to see if the distribution looks approximately normally distributed. A statistician will have other methods that they can use that are not discussed in this book.

Hypothesis Test

This is an example of a hypothesis test of the difference between two independent means of revenue of stores and online sales channels. It consists of the five steps previously introduced in Chapter 7, all of which are covered in the following section.

Step 1: The Hypothesis, or the Reason for the Business Question

The test shows that the independent means from the two channels are either H0 (not enough evidence to say the two independent group means are different) or H1 (two independent group means are significantly different from each other). Here again a two-tailed test is used because there is the equal sign for the null hypothesis and a not equal sign for the alternative.

H0: μ12

H1: μ1≠μ2

Step 2: Confidence Level

Just like the Pearson correlation has degrees of freedom and confidence level, so does the independent t-test. For the t-test, the standard alpha of 0.05, which is a 5 percent significance level, is used as the standard on any statistical package. Since there are two groups with four subjects in each group, both groups are added together with two degrees of freedom subtracted (one for each group). The degree of freedom is used because of the use of a sample, and, therefore, constraints are put on sample statistics.

df = n1 + n2 -2 = 4 + 4 -2 = 6

Some computer programs use a more mathematical formula to calculate the degrees of freedom, and the result is a bit different. This method is used when there is not a computer that calculates the exact degrees of freedom. The standard two-tailed test is used in this example.

Step 3: Mathematical Operations and Statistical Formula

The main objective for the independent t-test is to determine the difference between the two independent group means over the standard error, which is the noise in the formula. The first step is to calculate the independent mean for each group. The next step will be to calculate the standard error.

Here, there is the variable called “shop” that shows the difference between the binary mode of shopping: Internet versus store shoppers. The object is to determine from the data given if there is a significant difference between Internet and store shoppers in terms of how much they spend or their revenue.

The first part shows the data. The second part introduces the formula for the independent t-test. Then the means are determined, and the step-by-step procedure to figure out the variance and standard deviation for each group is followed so that the pooled standard error can be calculated. Once that is done, step 4 of the results can be analyzed.
  1. 1.

    Table 9-1 shows the data.

     
  2. 2.

    The formula is shown here.

     
Table 9-1

Data for Type of Shop and Total Revenue

Shop

Total revenue

  

1

90

1 =X1

= internet

1

100

2 = X2

= store

1

110

  

1

95

  

2

70

  

2

50

  

2

45

  

2

65

  

The X-bar is the mean of each sample, while S2 is the variance for each sample, and S is the standard deviation.

X-bar1 - X-bar2

t = _______________________________________________

Sqrt{[((n1 - 1) S12 + (n2 - 1) S22)/(n1 + n2 -1)](1/n1+n2)}

X-bar1= 90+100+110+95

------------------- = 98.75

4

X-bar2= 55 + 70 + 65 + 45

------------------- = 58.75

4

S is the standard deviation:

S = sqrt[(x – x-bar)2]/(n–1)]

S2 is the variance:

S2 = (x – x-bar)2]/(n–1)

Table 9-2 shows how the standard deviation is calculated for the group store and Internet group.
Table 9-2

Independent T-Test Calculated Using Excel Spreadsheet

Internet

X1

X1bar

X1-X1bar

(X1-X1bar)2

 

Store

X2

X2bar

X2-X2bar

(X2-X2bar)2

1

90

98.75

-8.75

76.5625

 

2

55

58.75

-3.75

14.0625

1

100

98.75

1.25

1.525

 

2

70

58.75

11.25

126.5625

1

110

98.75

11.25

126.5625

 

2

65

58.75

6.25

39.0625

1

95

98.75

-3.75

14.0625

 

2

45

58.75

-13.75

189.0625

  

Sum

0

218.75

   

Sum

0

368.75

  1. 3.

    Here the variance is calculated for each group:

     

Group 1 variance: S2 = 218.75/3 = 72.92, S = sqrt(218.75/3) = 8.54

Group 2 variance: S2 = 368.75/3 = 122.92; sqrt(368.75/3) = 11.087
  1. 4.

    When the variances are not considered different, the pooled standard error formula is calculated. However, when the variances are considered significantly different, there is a more complicated formula that needs to be computed. The book does not go over the math when the variances are different, but at times this independent t-test with different variances needs to be used because the variances are statistically different. (Note: When the variances are considered significantly different, the Data Analysis ToolPak can do that computation as well.)

     

The following are the calculations for the pooled standard error:

Sqrt{[((n1 - 1) S12 + (n2 - 1) S22)/(n1 + n2 -1)](1/n1+n2)}

=sqrt{[3(218.75/3)+ 3(368.75/3)]/(4+4-2)*(1/4 + 1/4)}

=sqrt{ [218.75 + 368.75]/6]*0.05}

=sqrt{587.5/6*(0.50)}

= 6.9970

Independent t-test = 98.75 - 58.75   40

--------------= -------- = 5.72

6.997    6.997

In Table 9-3 you can see the output and results of when a t-test is conducted with the Data Analysis ToolPak on the Data tab.
Table 9-3

Output of the Independent T-Test (Using ToolPak)

t-Test: Two-Sample Assuming Equal Variances

  
 

X

Y

Mean

98.75

58.75

Variance

72.91667

122.9166667

Observations

4

4

Pooled Variance

97.91667

 

Hypothesized Mean Difference

0

 

Df

6

 

t Stat

5.716717

 

P(T<=t) one-tail

0.000621

 

t Critical one-tail

1.94318

 

P(T<=t) two-tail

0.001241

 

t Critical two-tail

2.446912

 

The independent t-test result = 5.72.

The level of significance for the t-test = 2.447.

The p-value = 0.0012141.

Step 4: Results

The results are significantly different at the 0.05 alpha level since the p-value is less than the alpha level. Another way to say this is that the two groups are significantly different because the alpha is 0.05 and the p-value is 0.0012, and 0.0012 is smaller than 0.05; therefore, the two groups are significantly different.

Step 5: Descriptive Analysis

This result shows that shoppers spend significantly more on the Internet (X-bar1 = 98.75) than in the stores (X-bar2 = 58.75). A bar graph could have been used here to show the difference in the means. These results are straightforward and expected. Therefore, the company may want to have more and extra merchandise for their online shop than in their retail stores. This may help the company cut costs because a retail store is more expensive to upkeep than an online store.

Summary

In the next chapter, there will be a more involved example that will put all these statistical tests to use and show how to use these tests in a real life-life scenario. The next chapter will show a business situation of a retail email campaign.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.8.110