Chapter 2 Simple Linear Regression

Table of Contents for Chapter 2 Simple Linear Regression

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 2 Simple Linear Regression

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Touro currently works part-time for the travel agency Tourista. His boss wanted him to estimate how demand for nondurable goods and services, from the population in general and from tourists in particular, would be affected by an increase in minimum wage in the state. Thus, he can’t wait to start this chapter, knowing that once he finishes with it, he will be able to:

1. Develop econometric models for simple linear regression;

2. Distinguish between the regression estimators and the estimates;

3. Analyze basic concepts for t-tests and goodness-of-fit measurements;

4. Perform data analyses and interpret the results using Excel.

Econometric Models

Prof. Metric reminds us that an econometric model is used to estimate the possible effect of an explanatory variable on a dependent variable. Chapter 1 has the following econometric model:

where SPEND is the spending on nondurable goods and services of a representative consumer, and WAGE his or her average weekly wage. The parameter a₁ is the intercept and a₂ the slope of the regression line. The random error e accounts for a set of unobserved factors (other than WAGE or any random component in the model) that might affect SPEND.

A general model for any variables is written as:

Since e captures the random component of y, we have the following equation for the regression line:

Booka raises her hand and asks, “Can anyone explain the notation E(y|x)?” Invo offers an explanation as follows:

E(y|x) is called the expectation of y given x. In Chapter 1, we learned to find E(X), which is expectation of X given several values of X such as x₁, x₂,…, x_n. In this chapter, I think we are learning a new concept E(y|x), which implies that expectation of y is dependent on x instead of on several values of itself such as y₁, y₂,…, y_n. For this reason, I believe that E(y|x), which is also written as E(y|x = x_i), is classified as a conditional expectation and the whole function is called the conditional expectation function.

Prof. Metric commends Invo on his correct observation and points out that the error term e is the difference between actual y and its mean, as deduced from equations (2.2) and (2.3):

e = y − E(y|x) = y − (a₁ + a₂ x).

This error term also captures any estimation error that arises and any random behavior that might present in each individual identity.

Prof. Metric says that we will consider two types of data in this section: cross sectional and time series. A cross-sectional dataset presents many identities, which can be individuals, cities, states, and so on, in a single period. A time-series dataset tracks a single identity over many periods, which can be days, weeks, months, years, and so forth. Regarding cross-sectional data, the subscript i refers to the entity being observed, and the six classic assumptions are:

(i) The model is given by a linear function

y_i = a₁ + a₂ x_i + e_i.

(ii) E(e_i) = E (y_i) = 0.

(iii) Var(e_i) = Var(y_i) = σ².

(iv) Cov(e_i, e_j) = Cov(y_i, y_j) = 0 for i ≠ j.

(v) x_i is not random and must take at least two different values.

(vi) e_i ~ N(0, σ²); y_i ~ ([a₁+ a₂ x_i], σ²).

For time-series data, assumption (v) changes to:

(v) y_t and x_t are stationary random variables and must take at least two different values, and e_t is independent of current, past, and future values of x_t.

The remaining assumptions for cross-sectional data hold for time-series data, except that in this textbook the subscript i is changed to t and the subscript j is changed to z.

Regarding the stationarity in assumption (v) for time-series data, Prof. Metric says that we can roughly think of a stationary series as one that is neither explosive nor wandering aimlessly and that we will discuss this concept in detail in the later chapters. We also learn that data with the constant variance for all samples are said to be homoskedastic, and data with different variances for different samples are said to be heteroscedastic, which will be discussed in Chapter 4.

Simple linear regression often uses the least squares technique, also called ordinary least squares (OLS), because this technique minimizes the sum of the squared differences between the observed values of y and their expected values E(y|x). If assumptions (i) through (v) hold, then the Gauss-Markov theorem states that the OLS estimator will produce the best linear unbiased estimators (BLUE). If assumption (vi) also holds true, in addition to the other five assumptions, then the test results are valid.

The Central Limit Theorem (CLT) is very convenient for the assumption (vi). The theorem states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be close to the mean of the population. In addition, all variances will be close to the variance of the population divided by each sample’s size. In this case, the test results are valid.

Taila then asks, “What do they mean by sufficiently large?” Prof. Metric commends her on the question and says that the question of “how large is large enough” is a matter of interpretation, but a cross-sectional sample with 30 data points or a time-series sample with 20 data points is usually considered large enough to cite CLT for valid test results.

Estimators and Estimates

Interpreting Coefficient Estimates

We learn that we need to collect data for estimations and that the estimated version of equation (2.2) is:

Prof. Metric says that the derivation of the OLS estimators in a simple linear regression needs knowledge of calculus and can be found in Verbeek (2012). We are only required to know that the estimators for the parameters â₁ and â₂ are written as:

where is the sample mean of x, and is that of y.

Specific values for â₁ and â₂ are called coefficient estimates (or estimates for short). Some econometricians also call them estimated coefficients. They are in fact point estimates, which provide a single value for each parameter of the OLS regression.

Once each parameter is estimated, the OLS estimates are interpreted according to the econometric model we develop. In general, the intercept â₁ estimates the parameter a₁, which measures the number of unit changes in y when x is zero, whereas the slope â₂ estimates a₂, which measures the number of unit changes in y due to a unit change in x.

For example, the intercept in equation (2.1) represents a person’s spending on nondurable goods when his or her wage is zero, whereas the slope represents the number of unit changes in spending due to a unit change in weekly wage. If wage is the only source of income for this representative consumer, then the slope measures the marginal propensity in nondurable spending.

The slope of an OLS regression can be used to measure elasticity as well. Theoretically, the equation for elasticity is:

In econometrics, we can use the expressions in equation (2.6) to write the formula for calculating elasticity:

Hence, the estimation of elasticity is:

where the definitions of the variables are the same as those in the previous sections.

Prof. Metric says that there is a special case when both sides of equation (2.2) are in natural logarithmic form so that we have data for percentage change of y and percentage change of x. In this case, we do not have to follow equation (2.7), because â₂ itself will measure percentage change of y due to one percent change of x, which is the elasticity.

Point Estimates

We learn that equation (2.5) can be used to calculate the point estimates of the OLS regression. Since all econometric software provide point estimates, Prof. Metric refers us to Table 2.1 at the end of the chapter, so that we can follow the steps provided in this table to practice calculating those point estimates of â₁ and â₂.

Suppose that substituting all variables into equation (2.5) yields â₂ = 0.5 and â₁ = 1.5, then the equation for the regression line becomes:

ŷ_i = 1.5 + 0.5 x_i,

where 1.5 is the intercept and 0.5 is the slope of the line.

Invo exclaims, “Oh, if we let y be weekly spending and x weekly wage, both in hundreds of dollars, then the results imply that

(i) Weekly spending of a person without wage is $150 (= 1.5*$100), and

(ii) A $100 increase in weekly wage raises spending by $50 (= 0.5*$100).”

Prof. Metic commends Invo for his correct answers and moves to the next topic.

Interval Estimates

Prof. Metric says that in the previous subsection we only learned how to calculate point estimates. These point estimates do not account for any uncertainty in everyday life. Hence, we need to learn how to calculate an interval estimate, which provides a range of values instead of one single value for each parameter. This will allow us to face any uncertainty and still be able to state with a certain level of confidence that the actual value will likely fall between the upper and lower bounds (also called the endpoints) of this range.

To calculate interval estimates, a t-distribution for a sample of N observations is given as:

where N−2 = the degrees of freedom (df ) for the simple linear regression,

a_k = the parameters to be estimated,

â_k = the coefficient estimate from the OLS regression, and

se(â_k) = the standard error of the coefficient estimate.

We learned earlier that if the classic assumptions (i) through (vi) hold, then the OLS estimators a₁ and a₂ have normal distribution. The same is true for â₁ and â₂.

where

Prof. Metric reminds us of a statistics concept, in which a standardized normal random variable Z is obtained as follows:

A similar formula can be written for a₁. Given a critical value of Z (Z_c)—for instance, α is the probability that the value is in the tail of the distribution—then the interval estimator is:

The CLT allows us to use the estimated values for the t-distribution as a substitution for Z when a sample is large enough. In that case, a t-critical value from the t-distribution is given so that

Equation (2.9) provides an interval estimator of a_k. The interval is expressed as a 100(1 − α)% confidence interval. For example, if we choose α = 0.01, then the confidence interval is 99 percent—that is, we are 99 percent confident that the actual value falls somewhere between the lower bound and the upper bound of the interval estimate.

Prof. Metric says that we can choose a 90 percent confidence interval (α = 0.10) or a 95 percent confidence interval (α = 0.05) or a 99 percent confidence interval (α = 0.01). Most of the time, we choose the middle value (α = 0.05). Note that the interval has an upper bound and a lower bound. Hence, we will have to divide α into two tails, α/2 = 0.025, so that the total value of α is 0.05 (α = 0.025 + 0.025 = 0.05) and the confidence interval is 95 percent.

He then gives us an example: Suppose the sample size is N = 32 (df = 30), â₂ = 0.5, and se (â₂) = 0.1. Choosing a 95 percent confidence interval so that α = 0.05, we calculate the interval as follows.

On each of the two tails, α/2 = 0.025. We then look at a t-table for a critical value and find that t_C = t_{(0.975, 30)} = 2.042. Taila tells us that we can also type =TINV(0.05, 30) into any Excel cell to obtain t_C = t_{(0.975, 30)} = 2.042. We are very impressed with her intelligence. We find that the interval estimate for a₂ is:

0.5 ± 2.042*0.1 = 0.5 ± 0.2042 = (0.2958; 0.7042).

Touro exclaims, “Oh, then we are 95% confident that a $100 increase in weekly wage will raise nondurable spending anywhere from $29.58 to $70.42, depending on, I guess, the individual characteristics.” Prof. Metric praises Touro for the correct interpretation and guides us to the next topic.

Estimating Var(e_i)

Invo recalls equation (1.6) for variance and volunteers to write:

We are wondering why the second term in the formula disappears. Invo explains, “Assumption (ii) tells us that E(e_i) = E (y_i) = 0.” We now realize that what he says is true and guess that we can take the average of the squared errors as an estimator of σ², which is written as:

It turns out that this formula does not help, because the errors e_i are unknown. Prof. Metric tells us to recall the error term in equation (2.3) and the analog of it—namely, the OLS residual in equation (2.4). We are able to derive the following expressions from these two equations:

e_i = y_i − (a₁ + a₂ x_i),
ê_i = y_i − (â₁ + â₂ x_i).

Since their error terms are similar, we guess that we can use the regression residual ê_i in place of the errors e_i:

Touro says that this formula seems to have the same problem as the one in equation (1.7). Prof. Metric says that Touro is correct and refers us to equation (1.8) so that we can write an unbiased estimator of σ² as:

Taila exclaims, “Wow, we must have learned to estimate all parameters in this simple linear regression model.” Prof. Metric smiles and says, “No, we have one more parameter to estimate: the predicted value of y, and equation (2.11) will be helpful for the interval prediction of y.”

Predicted Value

We learn that once coefficient estimates are obtained, the value of y, called the predicted value, can be calculated by substituting these parameters into the model. From Table 2.1, the prediction for y when x = 6 (that is, $600) can be calculated using equation (2.4) as follows:

ŷ = 1.5 + 0.5*6 = 1.5 + 3 = 4.5 ($ hundreds) = $450.

Thus, a person with a weekly wage of $600 will spend $450 on nondurable goods and services.

Prof. Metric says that interval prediction can also be calculated in a similar manner using the standard error of the prediction se(p) for the model. Let ŷ₁ = â₁ + â₂ x₁, then a formula in Kmenta (1997) can be used for calculating an approximation of the se(p):

where s = the standard error of the regression, which is the square root of s² in equation (2.11) and

SSE = the sum of the squared errors, which are often called the residuals in regression.

The interval prediction is calculated by replacing se(â_k) in equation (2.9) with se(p):

Prof. Metric then tells us to use the aforementioned point estimate of $450 for weekly spending and calculate an interval prediction. He gives us se(p) = 0.5, N = 32, and α = 0.05. We are able to look at a t-table for a critical value and find that t_C = t_{(0.975, 30)} = 2.042, so the two endpoints of the interval prediction for weekly spending are:

4.5 ± 2.042*0.5 = (3.479; 5.521).

Hence, we predict with 95 percent confidence that a person with a weekly wage of $600 will spend anywhere from $347.90 (the lower bound) to $552.10 (the upper bound) every week.

Hypothesis Testing

Prof. Metric says that t-tests are used to verify the statistical significance or the expected values of the regression coefficients. Since simple linear regression has only one independent variable, the t-test for the significance of â₂ also serves as the test for model significance.

Four Standard Steps

A t-test is usually carried out in four standard steps, shown as follows:

(i) State the hypotheses.

Define a constant c as a specific value for a parameter that we want to test.

If the null H_o is a_k ≤ c, then the alternative H_a is a_k > c.

If the null H_o is a_k ≥ c, then the alternative H_a is a_k < c.

If the null H_o is a_k = c, then the alternative H_a could be a_k > c or a_k < c or a_k ≠ c.

(ii) The test statistic:

where k = 1 or 2 in a simple regression. (2.14)

(iii) The critical t-value, t_C, which indicated the border point of the rejection region, depends on the significance level of the test, which is usually at 1%, 3%, or 10%.

(iv) Decision: If |t_STAT| ≥ t_C, we reject the null and follow the alternative hypothesis. Otherwise, we do not reject the null. We then draw the meaning and the implication of our decision concerning the parameters of a regression.

We learn that there are two basic types of t-tests: If c is any constant other than zero, we have a test of a general hypothesis; and if c is zero, we have a test of significance, because a parameter only has a significant impact on a model if it is not zero. Prof. Metric emphasizes to us that the tails of the tests always follow the alternative hypotheses. There are only three cases: in the alternative hypothesis (H_a), if a_k > c, we have a right-tailed test; if a_k < c, we have a left-tailed test; and if a_k ≠ c, we have a two-tailed test.

Tests of the General Hypothesis

Booka offers an example. Last week she wanted to find out the demand for books in relation to income. The dependent variable is spending on books (BOOK), and the independent variable is per capita income (PERCA). She found the following relationship between the two variables: BOOK = 0.09*PERCA, and se(â₂) = 0.015. Booka says that she conducted a survey of 34 customers, so N = 34. We proceed to perform several tests as follows:

A right-tailed test

Invo says, “Let’s test the alternative hypothesis a₂ > 0.06 against the null hypothesis a₂ ≤ 0.06.” We agree with him and perform the test as follows:

(i) H₀: a₂ ≤ 0.06; H_a: a₂ > 0.06.

(ii) t_STAT = t_(N−2) = (0.09 − 0.06)/0.015 = 2.

(iii) We decide to choose α = 0.05, so t_C = t(0.95, 32) = 1.694.

Prof. Metric says that Excel always reports a two-tailed critical value, so to find t-critical for a one-tailed test, type =TINV(2α, df) into any cell, then press Enter.

For example, typing =TINV(0.10, 32) into any empty cell and then pressing the Enter key yields the result of 1.6939 ≈ 1.694.

(iv) Decision: Since |t_STAT| > t_C, we reject the null (H₀), meaning a₂ > 0.06, thus implying that the customers of the bookshop tend to spend more than 6 percent of the increase in their income on books.

A left-tailed test

Touro wants us to test the alternative hypothesis a₂ < 0.15 against the null hypothesis a₂ ≥ 0.15. We proceed with the test as follows:

(i) H₀: a₂ ≥ 0.15; H_a: a₂ < 0.15.

(ii) t_STAT = t_(N−2) = (0.09 − 0.15)/0.015 = −4.

(iii) We try α = 0.01 this time, so t_C = t_{(99, 32)} = 2. 449 (or (−t_C) = −2. 449). We also type =TINV(0.02, 32) into Excel, which gives us 2.44868 ≈ 2.449.

(iv) Decision: Since |t_STAT| > t_C (or t_STAT < (−t_C)), we reject the null (H₀), meaning a₂ < 0.15 and implying that the customers tend to spend less than 15 percent of the increase in their income on books.

Tests of Significance

A one-tailed test

Prof. Metric wants us to test the significance of the slope, so the null hypothesis states that the slope is zero. Since the right-tailed test and the left-tailed test are very similar, we choose to test the alternative hypothesis for a₂ > 0:

(i) H₀: a₂ = 0; H_a: a₂ > 0.

(ii) t_STAT = t_(N−2) = (0.09 − 0)/0.015 = 0.09/0.015 = 6.

(iii) We decide to choose α = 0.01 again, so t_C = t(0.99, 32) = 2.449.

(iv) Decision: Since |t_(N−2)| > t_C = t(0.99, 32), we reject the null, meaning a₂ > 0 and implying that income has a positive effect on book spending.

A two-tailed test

This time, Booka wants to test the alternative a₂ ≠ 0. Prof. Metric agrees and says that the left-tailed test of significance is similar to the right-tailed test, so we do not have to try it in the class. We now perform the two-tailed test as follows.

(i) H₀: a₂ = 0; H₁: a₂ ≠ 0.

(ii) t_STAT = t_(N−2) = (0.09 − 0)/0.015 = 0.09/0.015 = 6.

(iii) Since this is a two-tailed test, Prof. Metric reminds us to use α/2 = 0.025, and so

t_C = t_{(0.975, 32)} = 2.037.

For a two-tailed test, we learn to type =TINV(α, df) into any cell in Excel.

So, we type =TINV(0.05, 32), then press Enter. Excel gives us 2.0369 ≈ 2.037.

(iv) Decision: Since |t_STAT| > t_C, we again reject the null, meaning a₂ ≠ 0 and implying that income does have a significant impact on book spending. We also recall the earlier discussion and are able to state that the model is statistically significant as well.

Prof. Metric reminds us that Excel reports the t_STAT value for the two-tailed test of significance—that is, in the null hypothesis (H₀) a_k = 0 and in the alternative hypothesis (H_a) a_k ≠ 0. This t_STAT is also called the t-ratio because we only need to divide a_k by se(a_k) when c = 0. You will hear about this t-ratio from a lot of researchers because it is one of the most important statistics in econometric study.

Goodness-of-Fit and P-Value

Taila asks, “How can I compare two models and find out exactly which one predicts better?” Prof. Metric says enthusiastically that it is quite possible to do this, and that it is called ”goodness-of-fit.” We will learn about goodness-of-fit in this section.

R-squared (R²)

We learn that an R² value can measure how much the variation in y can be explained by the variation in x. In the first section, we have the estimated equation as:

Subtracting the sample mean from both sides of this equation gives us:

Square both sides of the previous equation and take the sum of these expressions to obtain:

We know that the cross term is zero because E (ȇ) = 0, so

The following definitions are commonly used for the squared terms in equation (2.15).

= the total sum of squares (SST)

= the sum of squares of the regression (SSR)

= the sum of squared errors (SSE).

Given these definitions, the R² value is the coefficient of determination and is defined as

If R² = 1, the model is said to have a perfect fit. In practice, we always find that 0 < R² < 1. R² is reported by all econometric packages, including Excel. If R² is high, then the model is a good fit; for example, R² = 0.92 implies that 92 percent of the variation in the dependent variable can be explained by the independent variable. If R² is low, then the model is not a good fit.

Prof. Metric points out that in Figure 2.2 of the Data Analyses section, SSR, SSE, and SST are reported in cells H12, H13, and H14, respectively.

P-Value

We learn that p-values can be used to measure the exact significance level of the aforementioned estimates. A p-value indicates the probability that a random variable falls into the rejection region at a particular significance level. Invo exclaims, “Sounds too abstract for me to understand. Can anyone explain more clearly what p-values really measure?”

Taila offers an explanation,

When we run a regression, the null hypothesis is the possibility that there is no effect on our results. For example, an experiment for a medical treatment that we know is totally ineffective. The null hypothesis is true: there is no difference between the experimental groups at the population level. Despite the null being true, it is possible that there will be an effect in the sample data due to random sampling error. P-values measure how well the sample data support the argument that this medical treatment has no effect. A high p-value implies our data are likely with a true null, and a low p-value implies our data are not likely with a true null. In the above example, a low p-value suggests that your sample provides enough evidence for you to reject the null of no effect in the medical treatment.

Prof. Metric praises Taila and provides us a numerical example: A p-value = 0.002 for a model implies that we reject the null at a 0.2 percent significance level, which is a very good fit because we need the model to satisfy only 5 percent significance level. A formula for calculating p-values is introduced in Hill et al. (2011), but most econometric packages, including Excel, report p-values, so we do not have to learn this skill.

Thanks to this practice, we can look up p-values instead of going through the steps of calculating t_STAT and depending on the t-table for t-critical values. We can always reject the null hypothesis if the p-value ≤ α, where α could be 0.01, 0.05, or 0.10. For example, if we choose α = 0.05, then we reject the null if the p-value ≤ 0.05. The following values are generally used for the tests of coefficient significances:

If p-value ≤ 0.01: the coefficient estimate is highly significant.

If 0.01 < p-value ≤ 0.05: the coefficient estimate is significant.

If 0.05 < p-value ≤ 0.10: the coefficient estimate is weakly significant.

If p-value > 0.10: the coefficient estimate is not statistically significant.

Prof. Metric asks us to look at Figure 2.2 in the Data Analyses section in order to practice how to interpret p-values: From this figure, the coefficient estimate of the intercept is reported in cell J17 and it is only weakly significant (with p-value = 0.065); whereas, the slope estimate is reported in cell J18 and it is highly significant (with p-value = 1.398*10⁻²⁹). Invo exclaims, “I see another value of 1.398*10⁻²⁹ reported in cell K12. Is that the p-value for the significance of the whole model?” Prof. Metric commends him on the remark and says that this is true.

Data Analyses

Performing a Regression

Prof. Empirie says that we usually have three types of data for performing regression: cross sectional, time series, and longitudinal/panel. We discussed cross-sectional and time-series data at the beginning of this chapter. A longitudinal/panel dataset follows many identities over many periods.

Prof. Empirie reminds us again that all data are available in the folder Data Analyses.

Invo has collected data on expenditure on durables (DUR) and personal income (INCOME) for 51 cities (ID) in 2015. He tells us that the dataset is too large to display but is available in the file Ch02.xls. Fig.2.1-2.2. We want to see if DUR depends on INCOME—that is, if DUR is the dependent variable and INCOME the independent variable. We open the data file and follow these steps to perform a regression of DUR on INCOME:

Select Data and then Data Analysis on the ribbon.

Click Regression in the list instead of Descriptive Statistics, then click OK.

A dialog box appears, as shown in Figure 2.1.

Type B1:B52 into the Input Y Range box.

Type C1:C52 into the Input X Range box.

Select the Labels and Residuals boxes.

Select the Output Range button and enter F1.

Click OK; you will see another dialog box stating that data will be overridden.

Click OK again to overwrite the data.

Figure 2.1 Performing regression: Commands in dialog box

Figure 2.2 Simple linear regression results

The regression results are shown in Figure 2.2. Prof. Empirie then guides us to study Figure 2.2 and writes the estimated results (also called estimated equation) as follows:

DUR_i = −3453 + 0.1205 INCOME_i.

(se) (1829) (0.0048) R² = 0.9275

To obtain the predicted value for DUR in 2016, we need to substitute any value of INCOME for a city into this equation. It turns out that when we click Residuals, Excel automatically calculates predicted values and reports them next to the residuals. For example, you can find the predicted DUR for the first city in cell G25 of the data file for Figure 2.2, which is 20,489.25.

Taila points out that she also found the upper and lower 95 percent bounds for the coefficient estimates in cells K17 through L18, which are repeated in cells M17 through N18. Prof. Empirie praises her for her keen observation and says that Excel does not report the interval estimates for the predicted values, so if we wish to know these values, we will have to calculate them using equations (2.7) and (2.8).

She then tells us that we will have opportunities to get hands-on experiences with time-series data and panel data in the later chapters.

Exercises

1. Given the information in Table 2.1, perform the following procedures:

(a) Fill in the blank spaces and then use the information in this table to calculate â₁ and â₂.

(b) What is the interpretation of â₁ and â₂ if the dependent variable is yearly salary in ten thousands of dollars and the independent variable is college education in years?

2. Given the following estimation results:

DEMAND = 4.198 − 3.229 PRICE

(se) (1.012) (0.5017) R² = 0.633 N = 26,

provide comments on the significances of a₁ and the implication of the R².

3. Use the results in Exercise 2 to test the following hypotheses at a 1 percent significance level:

(a) Test the slope is −3 against the alternative hypothesis that the slope is smaller than −3.

(b) Test the slope is zero against the alternative hypothesis that the slope is not zero.

Write the testing procedure in four standard steps similar to those in the text. The calculations of the t-statistics may be performed using a handheld calculator or using Excel.

Table 2.1 Information for calculating coefficient estimates

Table 2.2 Information for calculating R²

y
2
−1
2

4. Given the information in Table 2.2, fill in the blank spaces and then use the information in this table to calculate R² if SSE = 0.60. Provide comments on the result.

5. Data on education expenditures (EDU) and per capita income (PERCA) for 50 states and Washington, DC, in 2014 are in the file Education.xls.

(a) Perform a regression of PERCA on EDU (dependent variable = y = PERCA; independent variable = x = EDU), write the estimated equation, and find the point predictions for PERCA.

(b) Provide comments on the coefficient estimates and R².

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Variable