Estimation with Two Observations Per Person

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2.2. Estimation with Two Observations Per Person

When there are exactly two observations per person (t = 1, 2), estimation of the fixed effects model can be easily accomplished by OLS regression using difference scores for all the time-varying variables. The equations for the two time points are

Subtracting the first equation from the second, we get

Notice that both γz_i and α_i have been "differenced out" of the equation. Consequently, we cannot estimate γ, the coefficients for the time-invariant predictors. Nevertheless, this method completely controls for the effects of these variables. Furthermore, if ε_i₁ and ε_i₂ both satisfy the assumptions of the standard linear model, then their difference will also satisfy those assumptions (even if ε_i₁ and ε_i₂ are correlated). So OLS applied to the difference scores should give unbiased and efficient estimates of β, the coefficients for the time-varying predictors.

Let's try this on some real data. The sample consists of 581 children who were interviewed in 1990, 1992, and 1994 as part of the National Longitudinal Survey of Youth (Center for Human Resource Research 2002). We'll look at three variables that were measured in each of the three interviews:

ANTI: antisocial behavior, measured with a scale ranging from 0 to 6
SELF: self-esteem, measured with a scale ranging from 6 to 24
POV: poverty status of family, coded 1 for in poverty, otherwise 0

In this section, we will use only the data from 1990 and 1994. Our goal is to estimate a regression model of the form:

ANTI(t) = μ(t) + β₁SELF(t) + β₂POV(t) + ε(t)

for t = 1, 2. That is, we shall assume that poverty and self-esteem at time t affect antisocial behavior at the same time. I recognize that there may be uncertainties about causal ordering for these variables, but such difficulties will be ignored in this chapter. A related issue is whether the independent variables should be lagged, but with only two time points, it's impossible to estimate a model with both lags and fixed effects. Another implicit assumption is that the regression coefficients are the same at each time point, but that assumption can be tested or relaxed, as we'll see later.

As a point of departure, I begin by estimating the regression equation for each year separately using PROC REG. The SAS data set MY.NLSY has one observation per person, with separate variables for the measurements in the different years. The following SAS code is used for estimating the model:

PROC REG DATA=my.nlsy;
   MODEL anti90=self90 pov90;
   MODEL anti94=self94 pov94;
RUN;

Please note that this and all other data sets used in this book are available for download at support.sas.com/companionsites.

Selected portions of the output are displayed in Output 2.1. We see that both of the independent variables are statistically significant at beyond the .05 level in both years. As one might expect, higher self esteem is associated with lower levels of antisocial behavior, whereas poverty is associated with higher levels of antisocial behavior. The effect of SELF is slightly larger in 1994 than in 1990, while the reverse is true for POV.

Table 2.1. Output 2.1 Regressions of ANTI on POV and SELF in 1990 and 1994
Dependent Variable: anti90 child antisocial behavior in 1990
Variable	DF	Parameter Estimate	Standard Error	tValue	Pr>\|t\|
Intercept	1	2.37482	0.38447	6.18	<.0001
self90	1	−0.05014	0.01870	−2.68	0.0075
pov90	1	0.59473	0.12629	4.71	<.0001

Dependent Variable: anti94 child antisocial behavior in 1994
Variable	DF	Parameter Estimate	Standard Error	tValue	Pr>\|t\|
Intercept	1	2.88797	0.44688	6.46	<.0001
self94	1	−0.06388	0.02113	−3.02	0.0026
pov94	1	0.54712	0.14765	3.71	0.0002

The problem with these regressions is that they do not control for any time-invariant variables. Rather than putting such variables into the model, we'll proceed directly to the difference equation, which controls for all time-invariant variables. To do this, we first need a DATA step to create the difference scores:

DATA diff;
   SET my.nlsy;
   antidiff=anti94-anti90;
   povdiff=pov94-pov90;
   selfdiff=self94-self90;
PROC REG DATA=diff;
   MODEL antidiff=selfdiff povdiff;
RUN;

Results are shown in Output 2.2. The coefficient for SELFDIFF is about midway between the two coefficients for SELF in Output 2.1, but the coefficient for POVDIFF is markedly lower than the two coefficients in Output 2.1 and is far from statistically significant. It thus appears that although there might be an association between poverty status and antisocial behavior, that association is largely cross-sectional and is perhaps explainable by their mutual dependence on other variables. But changes in poverty status do not seem to be associated with changes in antisocial behavior.

Table 2.2. Output 2.2 Regression with Difference Scores
Dependent Variable: antidiff
Variable	DF	Parameter Estimate	Standard Error	tValue	Pr>\|t\|
Intercept	1	0.20923	0.06305	3.32	0.0010
selfdiff	1	−0.05615	0.01531	−3.67	0.0003
povdiff	1	−0.03631	0.12827	−0.28	0.7772

One concern is that there might be too few changes in poverty status to reliably estimate the effect of this variable. Output 2.3 shows that although the majority of children did not change in status, about 24% did change in one direction or another. This change should be sufficient to get a reliable estimate. In fact, the standard error for the poverty coefficient in the difference equation is about the same as the smaller of the two standard errors for the cross-sectional coefficients in Output 2.1.

Table 2.3. Output 2.3 Cross-Tabulation of Poverty Status in 1990 and 1994
pov90		pov94
Frequency		0	1	Total
	0	321	65	386
	1	73	122	195
Total		394	187	581

Regression with difference scores is not the only way to produce OLS estimates of the fixed effects model for these data. The following alternative method is computationally cumbersome, but instructive. The first step is to reorganize the data so that, instead of one observation per person, there is one observation per person-year. The same variable name is used for the measurements of each conceptual variable in the two years. The new data set also contains an ID variable that has the same value for both years for the same person, and a TIME variable with a value of 0 for 1990 and 1 for 1994. Here is the SAS code to produce this new data set:

DATA persyr2;
   SET my.nlsy;
   id=_N_;
   time=0;
   anti=anti90;
   self=self90;
   pov=pov90;
   OUTPUT;
   time=1;
   anti=anti94;
   self=self94;
   pov=pov94;
   OUTPUT;
RUN;

The new data set has 1162 observations, two for each of the 581 children. Equation (2.1) is now estimated in its original form

except that γz_i is removed because it is perfectly collinear with α_i. To allow for different intercepts in the two years, the regression includes the TIME variable. To estimate the α_i terms, the regression model includes 580 dummy variables, one for each child except the last. This would be awkward in PROC REG because all the dummies would have to be created in a DATA step. It's easy in PROC GLM, however, because the CLASS statement can create the dummies automatically:

PROC GLM DATA=persyr2;
   CLASS id;
   MODEL anti=self pov time id / SOLUTION;
RUN;

The SOLUTION option tells GLM to print out the coefficient estimates and their associated statistics. (This is unnecessary when there are no CLASS variables). Selected results are shown in Output 2.4. Only the first 10 coefficients for the dummy variables are shown.

The most important fact about this output is that the coefficients for SELF and POV (along with their standard errors, t-statistics, and p-values) are identical to those in Output 2.2, which was based on the difference equation. Furthermore, the coefficient for TIME is identical to the intercept in Output 2.2. So it seems that we get equivalent results using these two computational methods. But the dummy variable method is much slower than the difference score method because it requires the inversion of a very large matrix. On my PC, PROC REG took .01 seconds to estimate the difference-score model, whereas PROC GLM took 3.3 seconds to estimate the dummy variable model.

Table 2.4. Output 2.4 PROC GLM Results for Person-Year Data Set
Parameter	Estimate	Standard Error	tValue	Pr>\|t\|
Intercept	1.508435913	0.81214706	1.86	0.0638
self	−0.056148639	0.01530506	−3.67	0.0003
pov	−0.036308618	0.12827438	−0.28	0.7772
time	0.209233732	0.06305436	3.32	0.0010
id 1	0.658525908	1.06750664	0.62	0.5376
id 2	−0.377782710	1.06740470	−0.35	0.7235
id 3	4.650291609	1.06769622	4.36	<.0001
id 4	1.122217290	1.06740470	1.05	0.2935
id 5	0.178365929	1.06804248	0.17	0.8674
id 6	0.594142970	1.06716799	0.56	0.5779
id 7	2.925697051	1.06689988	2.74	0.0063
id 8	1.869548412	1.06724956	1.75	0.0803
id 9	0.037994330	1.06685908	0.04	0.9716
id 10	4.313399772	1.06781851	4.04	<.0001

Using the same person-year data configuration, it's possible to greatly reduce the computation time by not explicitly estimating the dummy variable coefficients. I'll explain this method in more detail in section 2.4, but let's first look at how it is implemented. In PROC GLM, we take the variable ID out of the MODEL and CLASS statements, and we put it in an ABSORB statement instead. Here is the new program:

PROC GLM DATA=persyr2;
   ABSORB id;
   MODEL anti=self pov time;
RUN;

This program took about the same computing time as using PROC REG with the difference scores. It's apparent that the results in Output 2.5 are identical to those found in Outputs 2.4 and 2.2. As we'll see in section 2.4, this last method for OLS estimation of the fixed effects model is generally preferred when there are more than two observations per person.

Table 2.5. Output 2.5 GLM Results Using the ABSORB Statement
Parameter	Estimate	Standard Error	tValue	Pr>\|t\|
self	−.0561486395	0.01530506	−3.67	0.0003
pov	−.0363086183	0.12827438	−0.28	0.7772
time	0.2092337322	0.06305436	3.32	0.0010

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Estimation with Two Observations Per Person

Create new playlist

Sign In

Sign Up

2.2. Estimation with Two Observations Per Person

Table of Contents for
Estimation with Two Observations Per Person