1.2 Model Identification

A fundamental consideration when specifying a SEM model is model identification. Essentially, model identification concerns whether a unique value for each and every unknown parameter can be estimated from the observed data. For a given free (i.e., unknown) parameter that needs to be model estimated, if it is not possible to express the parameter algebraically as a function of sample variances/covariances, then that parameter is defined to be unidentified. We can get a sense of the problem by considering the example equation Var (y) = Var (η) + Var (ε), where Var (y) is the variance of the observed variable y, Var (η) is the variance of the latent variable η, and Var (ε) is the variance of the measurement error. There are one known [i.e., Var (y)] and two unknowns [i.e., Var (η) and Var (ε)] in the equation; therefore, there is no unique solution for either Var (η) or Var (ε) in this equation. That is, there are an infinite number of combinations of values of Var (η) and Var (ε) that would sum to Var (y), thus rendering this single equation model unidentified. If we wish to solve the problem, we need to impose some constrains in the equation. One such constraint might be to fix the value of Var (ε) to a constant by adding one more equation Var(ε) = C (where C is a constant). Then, Var (η) would be ensured to have a unique estimate, that is, Var (η) = Var (y) − C. In other words, the parameter Var (η) in the equation is identified. The same general principles hold for more complicated SEM models. If an unknown parameter can be expressed by at least one algebraic function of one or more elements of the variance/covariance matrix of the observed variables, that parameter is identified. If all the unknown parameters are identified, then the model is identified. Very often, parameters can be expressed by more than one distinct function. In this case, the parameter is over-identified. Over-identification means there is more than one way of estimating a parameter because there is more than enough information for estimating the parameter. However, parameter estimates obtained from different functions should have an identical value in the population when the model is correct (Bollen, 1989a). A model is over-identified when each parameter is identified and at least one parameter is over-identified. A model is just-indentified when each parameter is identified and none is over-identified. The term identified models refers to both just-identified and over-identified models.

A not identified (under-identified or unidentified) model has one or more unidentified parameters. If a model is under-identified, consistent estimates of all the parameters will not be attainable. Since identification is not an issue of sample size, no matter how big the sample size, an under-identified model remains under-identified. For any model to be estimated it must be either just identified or over-identified.

Over-identified SEM models are of primary interest in SEM applications. It refers to a situation where there are fewer parameters in the model than data points.4 However, an over-identified model may not necessarily fit the data, thus creating the possibility of finding whether a model fits the observed data. The difference between the number of observed variances and covariances and the number of free parameters is called the degrees of freedom (df) associated with the model fit. By contrast, a just-identified model has a zero df, therefore goodness-of-fit cannot be tested for the model.

There is no simple set of necessary and sufficient conditions that provide a means for verification of identification of parameters in SEM models. However, two necessary conditions should always be checked. First, the number of data points must not be less than the number of free parameters. The number of data points is the number of distinct elements in the observed variance/covariance matrix, which equals (p + q)(p + q + 1)/2 where (p + q) is the total number of observed variables (i.e., p endogenous indicators and q exogenous indicators). That is, only the diagonal elements and one set of the off-diagonal elements in the observed variance/covariance matrix, either above or below the diagonal, are counted. When variance/covariance is analyzed, the free parameters in a SEM model are usually the factor loadings, factor variances/covariances, path coefficients, residual variances/covariances, and error variances that are to be estimated in the model. If there are more data points than free parameters, the model is said to be over-identified. If the data points are less than the number of free parameters, the model is said to be under-identified and parameters cannot be estimated because it is never possible to estimate more unknowns than there are knowns. Secondly, a measurement scale must be established for every latent variable in the model. To establish the measurement scale of a latent variable, one may (1) fix one of the factor loadings (λs) that link a latent variable to its observed indicators;5 or (2) fix the variance of the latent variable to 1 (by doing so, the latent variable is standardized). If the variance of the latent variable is free and if all the factor loadings (λs) are free, the factor loadings and the variance of the latent variable are not identified. If one or more parameters were unidentified, specifically, for an independent latent variable, the variance of the latent variable, coefficients associated with all paths emitted by the latent variable would be unidentified; for a dependent latent variable, the residual variance and coefficients associated with all paths leading to or from the latent variable would be unidentified.

These two conditions are necessary but not sufficient. Identification problems can still arise even if these two conditions are satisfied. Although a rigorous verification of model identification can be achieved algebraically, existing SEM software/programs generally provide a check for identification during model estimation. When a model is not identified, error messages will be printed in the program output, pointing to the parameters that are involved in the identification problem. Using this information, one can modify the model in a meaningful way to eliminate the problem.

The best way to solve the identification problem is to avoid it. Usually, one can add more indicators of latent variables so that there would be more data points. However, the primary prevention strategy is to emphasize correct parameter specification. Model identification depends on the specification of parameters as free, fixed, or constrained. A free parameter is a parameter that is unknown and needs to be model estimated. A fixed parameter is a parameter that is fixed to a specified value. A constrained parameter is a parameter that is unknown but is constrained to equal one or more other parameters. Supposing that previous research shows variables x1 and x2 have the same effect on a dependent measure, one may constrain their path coefficients equal in the SEM model. By fixing or constraining some of the parameters, the number of free parameters can be reduced; as such, an under-identified model may become identified. In addition, reciprocal or nonrecursive SEM is another common source of identification problem. A structural model is nonrecursive when a reciprocal or bidirectional relationship is specified so that there are feedback loops between two dependent variables in the model (e.g., y1 affects y2 on the one hand; and y2 affects y1 on the other hand). Such models are generally unidentified. For the y1 (y2) equation to be identified, one or more instrumental variables are needed to directly affect y1 (y2), but not y2 (y1) (Berry, 1984). Nonrecursive models are not discussed in this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.147.20