Testing homoscedasticity 

The ordinary least squares algorithm generates estimates that are unbiased (the expected values are equal to the true values), consistent (converge in probability to the true estimates), and with the minimal variance among unbiased estimates (when we get more data, the estimates don't change much, compared to other techniques). Also, the estimates are distributed according to a Gaussian distribution. But all of this occurs when certain conditions are met, in particular the following ones:  

  • The residuals should be homoscedastic (same variance).
  • The residuals should not be correlated, which generally occurs with temporal data.
  • There is no perfect correlation between variables (or linear combinations of variables).
  • Exogeneity—the regressors are not correlated with the error term.
  • The model is linear and is correctly specified.
  • There should not be outliers: abnormal values, typically generated due to errors in the data collection step.

If the first or the second rule is violated, the distribution of the parameters is no longer gaussian, but the estimates still converge to their true values. This means that we can't do any inference on the model (for example, we can't interpret either the t-values or the F-tests). Nevertheless, we can still use the point estimates. If the third rule is violated, we can't even compute the estimates. 4, 5, and 6 are worse, as they break the consistency of the estimates—the estimates are biased. In practice, 5 is not tested, and the model is formulated in such a way that the modeler is happy with the formulation. There are statistical tests for 1 and 2, and if those assumptions are violated, they can be fixed using special techniques. Violations to item 4 are complicated, and usually happen whenever. 4 is complicated, and usually happens when we exclude a relevant variable that is correlated with one of the regressors. For example, if we model the house prices in terms of the size of the property and the number of bathrooms and we exclude the number of bathrooms, our property size estimate will be biased because the residual will now contain this variable (the number of bathrooms). Consequently, the property size estimate will be correlated with the residual—larger properties will have more bathrooms). So, our estimated property size coefficient will be biased upward (it will be capturing both its effect + the number of bathrooms effect). There is a special technique designed to work in these cases called instrumental variables.  

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.198.94