Although the literature on confirmatory factor analysis (FA) is really impressive and is being highly used in, for example, social sciences, we will only focus on exploratory FA, where our goal is to identify some unknown, not observed variables based on other empirical data.
The latent variable model of FA was first introduced in 1904 by Spearman for one factor, and then Thurstone generalized the model for more than one factor in 1947. This statistical model assumes that the manifest variables available in the dataset are the results of latent variables that were not observed but can be tracked based on the observed data.
FA can deal with continuous (numeric) variables, and the model states that each observed variable is the sum of some unknown, latent factors.
The most used exploratory FA method is maximum-likelihood FA, which is also available in the factanal
function in the already installed stats
package. Other factoring methods are made available by the fa
functions in the psych
package—for example, ordinary least squares (OLS), weighted least squares (WLS), generalized weighted least squares (GLS), or principal factor solution. These functions take raw data or the covariance matrix as input.
For demonstration purposes, let's see how the default factoring method performs on a subset of mtcars
. Let's extract all performance-related variables except for displacement, which is probably accountable for all the other relevant metrics:
> m <- subset(mtcars, select = c(mpg, cyl, hp, carb))
Now simply call and save the results of fa
on the preceding data.frame
:
> (f <- fa(m)) Factor Analysis using method = minres Call: fa(r = m) Standardized loadings (pattern matrix) based upon correlation matrix MR1 h2 u2 com mpg -0.87 0.77 0.23 1 cyl 0.91 0.83 0.17 1 hp 0.92 0.85 0.15 1 carb 0.69 0.48 0.52 1 MR1 SS loadings 2.93 Proportion Var 0.73 Mean item complexity = 1 Test of the hypothesis that 1 factor is sufficient. The degrees of freedom for the null model are 6 and the objective function was 3.44 with Chi Square of 99.21 The degrees of freedom for the model are 2 and the objective function was 0.42 The root mean square of the residuals (RMSR) is 0.07 The df corrected root mean square of the residuals is 0.12 The harmonic number of observations is 32 with the empirical chi square 1.92 with prob < 0.38 The total number of observations was 32 with MLE Chi Square = 11.78 with prob < 0.0028 Tucker Lewis Index of factoring reliability = 0.677 RMSEA index = 0.42 and the 90 % confidence intervals are 0.196 0.619 BIC = 4.84 Fit based upon off diagonal values = 0.99 Measures of factor score adequacy MR1 Correlation of scores with factors 0.97 Multiple R square of scores with factors 0.94 Minimum correlation of possible factor scores 0.87
Well, this is a rather impressive amount of information with a bunch of details! MR1
stands for the first extracted factor named after the default factoring method (Minimal Residuals or OLS). Since there is only one factor included in the model, rotation of factors is not an option. There is a test or hypothesis to check whether the numbers of factors are sufficient, and some coefficients represent a really great model fit.
The results can be summarized on the following plot:
> fa.diagram(f)
Here we see the high correlation coefficients between the latent and the observed variables, and the direction of the arrows suggests that the factor has an effect on the values found in our empirical dataset. Guess the relationship between this factor and the displacement of the car engines!
> cor(f$scores, mtcars$disp) 0.87595
Well, this seems like a good match.
3.149.213.44