Point and Interval Estimates From a Regression Model
Chapter 5 Preview
When you have completed reading this chapter you will be able to:
Introduction
In Chapter 3, you saw simple bivariate regression equations for women’s clothing sales and for college basketball team’s WP. In this chapter, you will learn to apply these two models. You will use both to a make point estimate and an approximate 95 percent confidence interval centered on the point estimate for the dependent variables.
The Concept of a Point Estimate
A point estimate is a single value, or point. It is your best estimate (or prediction) of the value of a dependent variable, given some value(s) for the independent variables(s). The point estimate is generated directly from the regression equation. To illustrate, suppose that you have a very simple regression model as follows:
SALES = f(PER CAPITA INCOME)
SALES = –10 + 0.002(PER CAPITA INCOME)
If per capita income is $30,000 then your point estimate (or prediction) of sales would be:
SALES = −10 + (0.002 × 30,000) = −10 + 60 = 50
While this would be your best estimate it is likely to be wrong. This may sound disturbing to you. However, statistical theory supports the notion that this best estimate will be close to correct most of the time. It just may not be exactly right. Suppose in this example sales are in thousands of units. Your estimate of sales is really 50,000 units … not 50,001 or 49,999 but exactly 50,000. The business world is not so precise. If sales really turn out to be 50,001 and you estimated sales to be 50,000 you would look pretty good. But technically you would be wrong. Later in this chapter, you will see how to calculate point estimates for some of the models you have seen in previous chapters.
The Concept of Approximate 95 Percent Confidence Interval Estimates
Because you would almost always be wrong (but close) with a point estimate you may want to provide a set of lower and upper bounds for an estimate such that you would have some level of confidence that the true value would be in that interval. To do this you need to know another statistic called the “standard error” of the estimate (SEE). The SEE is just called standard error in Excel.1 This measure is included in the output of almost every regression software program.2
The approximate 95 percent confidence interval is found by taking your point estimate plus and minus two times the standard error of the estimate (SEE) as follows:
Point Estimate ± 2(Standard Error of the Estimate)
The number 2 in this calculation is an approximation for 1.96 from the t-table with a large number of degrees of freedom. This confidence interval is also an approximation because the true confidence interval bows away from the regression line as the dependent variable and the independent variable(s) move away from their respective mean values.
You will see this concept applied later in this chapter for regression models you have already seen. After seeing some examples of the concept you should find it easy to apply in your own situation.
There is a way to get an exact 95 percent confidence interval. The arithmetic is a bit more complex but it can be done.3 However, relying on the approximate 95 percent confidence interval is sufficient for most purposes.4
Point Estimates in Practice: Two Examples
Women’s Clothing Sales
Consider the following women’s clothing sales (WCS) model first introduced in Chapter 2. In this model WCS was a function of personal income (PI). The model was:
WCS = 1,187.123 + 0.165(PI)
In Chapter 2, you put a value for personal income into this equation to get an estimate of WCS for that level of personal income. This was a point estimate because you obtained a single number for your answer. In that example, you estimated the dollar amount of WCS if personal income is 9,000 (billion dollars). You obtained:
WCS = 1,187.123 + (0.165 × 9,000)
WCS = 1,187.123 + 1,485 = 2,672.123
Thus, your estimate of WCS if personal income was 9,000 (billion dollars) is 2,672.123 (million dollars) or $2,672,123,000. Do you think this would be exactly right for that level of PI? Probably not!
Winning Percentage for College Basketball Teams
In the second example, consider the model for the WP of college basketball teams, also shown in Chapter 2. The model was:
WP = –198.9 + 5.707(FG)
In Chapter 2, you learned that if a team’s FG percentage is 45 percent your best estimate of the team’s WP would be:
WP = –198.9 + 5.707(FG)
WP = –198.9 + (5.707 × 45) = 57.9
This probably seems awfully precise to you. Would you be sure the WP might not be 57.6 percent or maybe 58.1 percent? Again, probably not. But 57.9 percent would be your best point estimate (prediction) of a team’s conference winning percentage (WP) when a team’s FG percentage is 45 percent. If you need to provide one number as your estimate, 59.7 percent would be the best number to give.
Approximate 95 Percent Confidence Interval Estimates: Two Examples
While point estimates are useful, it is very unlikely that they will be exactly correct. Thus, it is often preferable to make an interval estimate in such a way that you can be 95 percent confident that the true value will be somewhere within the interval. You have seen that an approximation for a 95 percent confidence interval can be given as:
Point Estimate ± 2(SEE)
where SEE is the standard error of the estimate.5
Before looking at examples, you should know exactly where to find the SEE in Excel’s regression output. Look at the partial output from an Excel regression in Table 5.1. You see that the SEE is in the top section of Excel’s output under the heading “Regression Statistics.” In this example, the SEE is 1.676. To calculate an approximate 95 percent confidence interval for this model you would have:
Point Estimate ± 2(SEE)
Point Estimate ± 2(1.676)
You would need the entire regression equation to get your point estimate. For now the important thing is for you to know where to find the SEE in your regression results from Excel. In the examples that follow you will see how to do specific calculations.
Table 5.1 Partial regression results from Excel for a market share model. The standard error (SE) is in bold and a slightly larger type. This model will be discussed fully in Chapter 7
Summary output |
|
Regression Statistics |
|
Multiple R |
0.929 |
R-square |
0.863 |
Adjusted R-square |
0.812 |
Standard error |
1.676 |
Observations |
12 |
Women’s Clothing Sales: Example One
As you have seen, the relationship between women’s clothing sales (WCS) and personal income (PI) is:
WCS = 1,187.123 + 0.165(PI)
Next, let’s assume that for PI = 9,000
WCS = 1,187.123 + (0.165 × 9,000)
WCS = 1,187.123 + 1,485 = 2,672.123
This is the point estimate (prediction). Table 5.2 provides the Excel output needed to get the approximate 95 percent confidence interval. Here you see that the SEE is 525.160. You now have enough information to calculate the approximate 95 percent confidence interval. The point estimate of 2,672.123 is your starting point and the SEE is 525.160. The approximate 95 percent confidence interval is:
Point Estimate ± 2(SEE)
2,672.123 ± 2(SEE)
2,672.123 ± 2(525.160)
2,672.123 ± 1,050.320
1,621.803 to 3,722.443
Table 5.2 Partial Excel output for women’s clothing sales as a function of personal income. The ANOVA part of the output is omitted here
Summary output |
||||
Regression Statistics |
||||
Multiple R |
0.416 |
|||
R-square |
0.173 |
|||
Adjusted R-square |
0.167 |
|||
Standard error |
525.160 |
|||
Observations |
135 |
|||
Coefficients |
Standard error |
t Stat |
p-Value |
|
Intercept |
1,187.123 |
335.390 |
3.540 |
0.001 |
PI |
0.165 |
0.031 |
5.277 |
0.000 |
This is based on how WCS are influenced by only a single independent variable, personal income. As shown in Table 5.2, the coefficient of determination (R2) is 0.173. Thus, this simple model only explains about 17.3 percent of the variation in WCS.
As you would expect WCS are influenced by more than just PI. As you learn more about regression you will see how you can include many other measures that may influence WCS into a regression model. If, in addition to personal income, you include the unemployment rate among women, an index of consumer sentiment, and measures to account for seasonality you can develop a much better regression model. If we include them the coefficient of determination increases to 0.966.6 This means that the more complete model explains 96.6 percent of the variation in WCS. Partial regression statistics for this model is shown in Table 5.3.
With the more complete model the approximate 95 percent confidence interval becomes narrower as follows:
Point Estimate ± 2(SEE)
Point Estimate ± 2(105.403)
Point Estimate ± 210.806
Table 5.3 Partial regression statistics for a more complete model of women’s clothing sales. The SEE is much smaller than the SEE in Table 5.2
Summary output |
|
Regression Statistics |
|
Multiple R |
0.985 |
R-square |
0.970 |
Adjusted R-square |
0.966 |
Standard error |
105.403 |
Observations |
135 |
Table 5.4 Comparison of two models for basketball winning percentage
Regression Statistics |
Regression Statistics |
||
Multiple R |
0.632 |
Multiple R |
0.924 |
R-square |
0.399 |
R-square |
0.853 |
Adjusted R-square |
0.391 |
Adjusted R-square |
0.828 |
Standard error |
16.472 |
Standard error |
8.760 |
Observations |
82 |
Observations |
82 |
Small bivariate regression model |
Larger regression model |
Compare this plus or minus range (210.806) with the one for the simpler models (1,050.320).
You see that by including more causal variables two important things change. The coefficient of determination increases from 16.7 percent to 96.6 percent, and the standard error decreases from 525.160 to 105.403. As a result, the width of the approximate 95 percent confidence interval falls from 1,050.320 to 210.806. This allows you to be more precise in your estimates based on a regression model.
Basketball Winning Percentage Example
Statistics play an important role in sports, as illustrated by the book and the movie MONEYBALL, which was based on a true story. You have already seen one model of the collegiate basketball WP in which only the percentage of successful FG attempts was used as a causal variable. That model explained about 40 percent of the variation in WP. If you were to include other offensive and some defensive measures you could develop a model that would explain roughly 83 percent of the variation in WP.
The regression statistics for both of these models are shown in Table 5.4. You see that in the larger regression model the explanatory power is about twice as high as with the smaller model and the SEE is much smaller.
For these two models, the widths of the approximate 95 percent confidence bands would be:
Small Model: Point Estimate ± 2(16.472) = Point Estimate ± 32.944
Larger Model: Point Estimate ± 2(8.760) = Point Estimate ± 17.520
Again you see that the width of the approximate 95 percent confidence band depends on the model. Rarely is a model with only one independent variable sufficient in business applications. Beginning in Chapter 6 you will build on what you have learned about the basics of regression analysis to build larger, more complex, and more useful regression models.
What You Have Learned in Chapter 5
1 The standard error of the estimate may also be represented by SER or SEE (standard error of the regression or standard error of the estimate, respectively). In Excel, it is simply called the standard error and is found near the top of regression results in the section headed “Regression Statistics.”
2 However, it can be calculated as follows: SEE = [(∑ (Yi – YiE )2 ÷ (n – 2)]0.5
where n is the number of observations used in the estimation of the regression equation (the 0.5 power is the same as the square root).
3 While an approximation for a 95 percent confidence interval is often used because of its simplicity, a more precise interval estimation procedure is available:
Y = YE ± [t(SE)[1 + (1/n) + (Xo – XM)2 / ∑ (Xi – XM)2]]
where
YE = point estimate from the regression equation
SE = standard error or standard error of the estimate (SEE) or standard error of the regression (SER)
t = t-table value at the desired two-tailed significance level and n–2 degrees of freedom
n = number of observations used in estimating the regression model
Xo = value of the independent variable for which the estimate of Y is desired
XM = mean value of the independent variable
∑ (Xi – XM)2 = the squared deviations of the independent variable from its mean value for all n observations
In this formulation for the confidence interval, you can see that the width of the interval depends on the value of the independent variable for which the estimation is made (i.e., on the value of Xo). Also, note that the interval becomes wider the farther Xo is from XM.
4 Statisticians will sometimes make a distinction between a confidence interval and a prediction interval. For practical understanding of regression this distinction is not important to you. For example, in Applied Statistical Methods (Carlson and Thorne, Prentice-Hall, 1997, p. 664) the terms are used interchangeably.
5 Remember that the standard error (SE) can also be referred to as the standard error of the estimate (SEE) or the standard error of the regression (SER) depending on the software you use. In Excel it is “standard error.”
6 When you use multiple independent variables you use the adjusted R2 for the coefficient of determination.
18.117.100.82