6.2. Random Effects as a Latent Variable Model

In chapter 2, the random effects model was specified as


where yit is the value of the response variable for individual i at time t, xit is a vector of time-varying covariates, zi is a vector of time-invariant covariates, αi denotes the random effects, and εit is a random disturbance term. We assume that αi and εit represent independent normally distributed variables, each with a mean of 0 and a constant variance. We also assume, at least for now, that these random components are independent of both xit and zi.

It is now well known (Muthén 1994) that a random effects models such as the one in equation (6.1) can be represented as a structural equation model (SEM) that can be estimated with one of several software programs (e.g., LISREL, EQS, AMOS, or PROC CALIS). Conceptually, we regard equation (6.1) as specifying a separate equation for each point in time, with regression coefficients constrained to be the same across time points. The random components α and ε are regarded as latent variables; however, while there is only one α, there is a distinct ε for each time point.

SEM models are often represented as path diagrams (Kline 1998). Figure 6.1 is a path diagram for a model with three points in time and a single time-varying independent variable. In path diagrams for SEMs, the convention is that directly observed variables are enclosed by rectangles, whereas latent variables are enclosed by circles or ellipses. A straight, single-headed arrow denotes a direct causal effect of one variable on another, while a curved double-headed arrow denotes a bivariate correlation between two exogenous variables. (In the language of simultaneous equations, endogenous variables are those that are dependent variables in at least one equation. Exogenous variables are those that are not dependent variables in any equation.)

Figure 6.1. Figure 6.1 Path Diagram of a Random Effects Model for Three Points in Time

In chapter 2, we estimated the model in equation (6.1) using PROC MIXED for the NLSY data, which had observations at three points in time for 581 children. The working data set had three records per child, for a total of 1743 records. The dependent variable was a measure of antisocial behavior (ANTI). Independent variables included two time-varying variables, poverty (POV) and self-esteem (SELF), along with several time-invariant variables.

To estimate the model with PROC CALIS, we use the original form of the data set with one record per child and separate variable names for the same variable measured at different times. The model is specified as three distinct equations, one for each of the three time points, and each equation is a SAS language representation of equation (6.1). Here is the code:

PROC CALIS DATA=my.nlsy UCOV AUG;
LINEQS
   anti90=t1 INTERCEPT + b1 pov90 + b2 self90 +b3 black + b4
     hispanic + b5 childage + b6 married + b7 gender + b8 momage +
     b9 momwork + falpha + e1,
   anti92=t2 INTERCEPT + b1 pov92 + b2 self92 +b3 black + b4
     hispanic + b5 childage + b6 married + b7 gender + b8 momage +
     b9 momwork + falpha + e2,
   anti94=t3 INTERCEPT + b1 pov94 + b2 self94 +b3 black + b4
     hispanic + b5 childage + b6 married + b7 gender + b8 momage +
     b9 momwork + falpha + e3;
STD
   falpha e1 e2 e3 = s1 s2 s2 s2;
RUN;

Several things are worth noting about this program:

  • The UCOV and AUG options are necessary for estimating a model with an explicit intercept. UCOV tells CALIS to estimate the model based on a sum of squares and a cross-products matrix rather than a correlation matrix. AUG says to augment this matrix with a column corresponding to a "variable" that has a constant value of 1.

  • The LINEQS statement specifies the set of linear equations that make up the model. The equations are separated by commas and concluded with a semicolon.

  • In each equation, names must be chosen for each of the parameters (b1, b2, etc.). If the same name is used in more than one equation, the corresponding parameter estimates are constrained to be equal.

  • INTERCEPT refers to a "variable" with the constant value of 1. t1, t2 and T3 refer to the actual intercepts, which are allowed to differ across the three equations. This is equivalent to letting TIME be a CLASS variable in PROC MIXED.

  • Variable names that are not on the input data set and which begin with an E, F, or D are assumed to be latent, unobserved variables. FALPHA, which appears in all three equations with an implicit coefficient of 1.0, corresponds to the αi in equation (6.1). Similarly, E1, E2, and E3 correspond to the εit in equation (6.1).

  • The STD statement assigns names to the variances of the latent variables and also imposes constraints. Thus, S1 is the variance of FALPHA, S2 is the variance of E1 and also the variance of E2 and E3. Setting those three variances equal is equivalent to the constant variance assumption.

As with most SEM programs, PROC CALIS produces a large amount of output. A small but crucial part of this output—the regression coefficients, standard errors and test statistics—is displayed in Output 6.1. Estimates are reported for each of the three equations, but because only the intercept is allowed to vary with time, most of this information is redundant. These estimates should be compared with those in Output 2.15 produced by PROC MIXED. The coefficient estimates are very close but, in some cases, differ slightly in the fourth or fifth decimal place. For example, the PROC MIXED coefficient for HISPANIC is −.2182, but for PROC CALIS it is –.2180. The reason for this difference is that the default estimation method in PROC MIXED is something called restricted maximum likelihood (REML), whereas the estimation method used in PROC CALIS is conventional maximum likelihood (ML). PROC MIXED can be forced to produce results that are identical to PROC CALIS by putting the option METHOD=ML on the PROC statement.

Another difference is that PROC CALIS produces three different intercepts, one for each point in time, whereas PROC MIXED gives one intercept and two coefficients for TIME. This difference is only apparent, however. The intercept reported by MIXED (2.741) is the intercept for time 3, which is close to the time 3 intercept in the CALIS output (2.748). To get the intercept for time 1, we add the coefficient for the time 1 dummy (–.2163) to the intercept, yielding 2.53, which is what PROC CALIS reports. Similarly, to get the intercept for time 2, we add the coefficient for the time 2 dummy (–.1690) to the intercept, yielding 2.579.

PROC CALIS also reports estimates of the variance of the latent variable, FALPHA, and the common variance for E1 through E3, as seen in Output 6.2. These are quite close to the covariance parameter estimates reported in Output 2.15 for PROC MIXED. Again, they would be virtually identical if we had used the METHOD=ML option on the PROC MIXED statement.

Table 6.1. Output 6.1 Random Effects Model Estimated with PROC CALIS



Table 6.2. Output 6.2 Variance Estimates Produced by PROC Calis for Random Effects MODEL
Variances of Exogenous Variables
VariableParameterEstimateStandard ErrortValue
falphas11.284890.0959113.40
e1s20.994580.0413024.08
e2s20.994580.0413024.08
e3s20.994580.0413024.08

We now have a way of estimating a random effects model with PROC CALIS that gives us the same results as PROC mixed. However, there are some important limitations to this method. First, unlike PROC MIXED, this method is difficult to implement with unbalanced data. That is, there must be the same number of repeated measurements on the outcome variable for each individual in the sample. If some of the children in our sample had missing values for, say, ANTI94, they would have to be deleted entirely from the sample. Second, although possible, it's quite cumbersome to set up the model to handle linear effects of time, linear interactions with time, or random coefficients (Muthén and Curran 1997). By contrast, this is easily managed in PROC MIXED. In PROC CALIS, it is easy to allow for unrestricted interactions with time by simply giving different parameter names to a variable's coefficients at different points in time.

Balancing these limitations are some important advantages to the SEM approach. First, it is possible to combine the random effects model with models for multiple indicators of latent variables. These variables may be either independent or dependent variables. Good introductions to latent variable models with multiple indicators can be found in Kline (1998) or Hatcher (1994). Second, as we will see in the next section, the random effects model in PROC CALIS can be extended to estimate fixed effects models in ways that facilitate a comparison and a compromise between the two models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.98.120