The PCA method that we have discussed so far models all of the variance of the variables to which it is applied. An alternative approach, which is often confused with PCA, is to model only the common variance: an approach called factor analysis (FA). In this chapter, we will discuss exploratory factor analysis (EFA).
The following are the basic terminologies that you need to be aware of:
The following is a list of the matrices that you should familiarize yourself with:
EFA assumes that the observed variables can be explained by some unobserved variables, also called latent variables, which are statistically modeled as a source of common variance.
If we look at the following diagram, it depicts a Trait that is a common source of variance to five observed variables, A through E. In this diagram, the arrows represent correlations between the observed variables and the latent trait as well as the path coefficients:
The rules of covariance algebra give us the following formula:
cov(A, B) = cov(A, Trait) × cov(B, Trait)
cov(A, C) = cov(A, Trait) × cov(C, Trait)
Based on these rules, EFA tries to estimate the path coefficients using some type of statistical estimation method that achieves the best fit, since we are not able to directly calculate these correlations.
To understand the basic idea of how this can be done for a single factor and introduce some concepts, we will start with a very old method of factor estimation, known as the centroid method. This method has largely been supplanted by new computerized methods but serves well to demonstrate the basic idea of factor analysis.
For the single factor model, we will call the path coefficients by the small letter version of the capital representing the observed trait (for example, cor(A, trait) = a). Let's assume that we can place the products of all path coefficients with each other into a square matrix, which is called the reduced correlation matrix Rr:
We do not know the values of the path coefficients a, b, c, d, or e. (After all, it is these values that we are trying to estimate.) However, because of the rules of covariance algebra, we know:
ab = cor(A,B)
Thus, while we do not know the values of any of the path coefficients, we are able to estimate the product of these coefficients simply by calculating the correlations between the observed variables associated with these path correlations. It is only the communalities along the diagonal that we are unable to calculate and must come up with some initial estimates.
There are a number of possible methods to generate initial communality estimates:
The greater the number of variables involved, the lesser the importance of the initial estimates of the communalities. This is because when there are more observed variables, the size of the matrix increases dramatically, and the larger a matrix is, the smaller the proportion of elements that fall on the diagonal.
To give the mathematical background of the centroid method, we start with the reduced correlation matrix. To emphasize that the diagonal has merely initial estimates of communalities, we use the 0 subscript, as shown in the following matrix:
We then sum each row (or each column, since this is a symmetric matrix) to get the following matrix. Here we achieve the sum of rows by post-multiplying the reduced correlation matrix by a column matrix of 1s as shown in the following matrix:
We then sum all elements of this matrix of row sums to get the sum of all elements in the reduced correlation matrix, and we take the square root of the total as shown in the following formula:
We then divide the sum of each row by the square root of the total to create path coefficient estimates:
If we wish to obtain better path coefficient estimates, we repeat this procedure multiple times by substituting the squared path coefficient estimates for the initial communality estimates.
Since this is a single factor, the matrix F is simply a 1. Using these solutions, we can now solve for U:
Rimp – PFP' = UU
And for the residual correlation matrix:
Robs – Rimp = Rresid
Now, we will execute the same steps using R. We start with a single factor model from the physical functioning dataset, and pick just the items that we think are related to the leg function:
le.matrix <- as.matrix(phys.func[,c(2,3,4,8,9,10,13,14)])
When coming up with a numerical solution, we need initial estimates for the communalities. We then use the simplest possible reduced correlation matrix. We will use "1" for simplicity as initial estimates for communalities (that is, we use the observed correlation matrix as the reduced correlation matrix) as depicted in the following funtion:
le.cor <- cor(le.matrix) le.cor.reduced <- le.cor
Following this step, we sum the rows (or the columns), and from these row sums, we create a sum of all values in the matrix. Path coefficient estimates are equal to the row sums divided by the square root of the total sum of all values in the matrix:
row.sums <- le.cor.reduced %*% matrix(rep(1, 8), nrow = 8) total.sum <- sum(row.sums) sqrt.total <- sqrt(total.sum) row.sums / sqrt.total
Now, if we recall, our initial communality estimates were just "1". The communalities are simply the path coefficients squared. Therefore, if we like, we can use our solutions for path estimates to create a new reduced correlation matrix, and submit this new reduced correlation matrix to the centroid method all over again. We could do this repeatedly until the path coefficient estimates change minimally with each additional iteration.
An important limitation of the centroid method is that it assumes that all observed variables are correlated in the same direction (for example, all positive) with the latent trait.
Earlier in this chapter, we went through the basic idea of estimating a path coefficient for a single common factor, but EFA is used not for a single factor but for multiple underlying factors, as depicted in the following figure:
Here we see that there are two traits and six observed variables with both unobserved traits assuming to have some (potentially almost zero) correlation with each of the observed variables and with each other (arrows not shown for this). Here we will use a small letter followed by a number to indicate the path. For example, a1 is the path coefficient from Trait-1 to A.
The centroid method described earlier only extracts a single factor. If we want to extract multiple factors, we have to do this one at a time, subjecting residual matrices to the centroid method repeatedly. Since this is not a typically used method with modern computers, we will not go through this tedious exercise, but rather skip to a commonly used method that can extract multiple factors.
Rather than iteratively factoring out residual matrices, we can directly extract the desired number of factors using principal axis factoring (PAF).
We start with our reduced correlation matrix. We then perform an eigenvalue decomposition of the reduced correlation matrix, yielding eigenvalues, L, rank ordered from small to large, and a matrix of corresponding eigenvalues, V. If we wish to compute a two-factor solution, we post-multiply the matrix of the first two eigenvectors by a diagonal matrix containing the first two eigvenvalues.
We will now go through the basic steps of how PAF can be performed in R. We will be demonstrating this with multiple factors, so we will use the full physical functioning dataset. First, we obtain the correlation matrix using the following code:
phys.cor <- cor(phys.func)
Then, we create a reduced correlation matrix using squared multiple correlations:
reduce.cor.mat <- function(cor.mat) { inverted.cor.mat <- solve(cor.mat) reduced.cor.mat <- cor.mat diag(reduced.cor.mat) <- 1 - (1/diag(inverted.cor.mat)) return(reduced.cor.mat) } phys.cor.reduced <- reduce.cor.mat(phys.cor)
Finally, we perform the PAF:
paf.method <- function(reduced.matrix, nfactor) { row.count <- nrow(reduced.matrix) eigen.r <- eigen(reduced.matrix, symmetric = TRUE) V <- eigen.r$vectors[,c(1:nfactor)] L <- diag(sqrt(eigen.r$values[c(1:nfactor)]), nrow = nfactor) return((V %*% L)) } path.coef <- paf.method(phys.cor.reduced, 3)
You may find that all loadings on a factor are negative. There is nothing wrong with reversing the sign on a factor loading so long as the relative signs of factors remain consistent (that is, the sign reversal has to be applied consistently to path coefficients). It is important to note that all observed variables have a loading on all extracted factors.
Depending on our goals, this can be the end of our factor analysis, but generally researchers seek to find a simpler structure by rotating the factor structure in a similar manner as PCA rotates axes.
Principal axis factoring is likely the oldest commonly used method of factor extraction, and it is probably still the most commonly used. It does not make distributional assumptions, and in the case of normally distributed data gives pretty similar estimates as methods that do make distributional assumptions. Maximum likelihood estimation is being used increasingly and is considered numerically superior on datasets that are close to multivariate normally distributed. This method assumes that the dataset is normally distributed and maximizes the (usually log) likelihood function based on a normal distribution. It is relatively robust to mild or moderate deviations from this assumption. Minimum residual factoring seeks to minimize the residual correlations off the diagonal and gives similar estimates as maximum likelihood, while being robust to poorly behaved matrices.
The final step in most factor analyses is factor rotation. The goal of this step is to determine whether the cloud of data can be represented by a simpler set of coordinates by rotating the axes of the factors. Rotation should increase the number of near zero coefficients in the factor pattern matrix. All observed variables will still have a loading on all of the factors, but ideally, observed variables will show substantially higher loadings on a single particular factor than on other factors once this is completed.
Broadly speaking, there are two approaches to factor rotation: orthogonal and oblique. Orthogonal rotations are still the most commonly used methods, and many regard them as producing easier to interpret solutions. However, oblique rotations are thought to provide more of a real-world estimate. It is also worth noting that single factor models cannot be rotated.
There are many different factor analysis methods, but here we will delve into just four: two orthogonal and two oblique rotations.
In this section, we will discuss the commonly used orthogonal factor rotation methods. These rotation methods produce factors that have no correlation, which is why they are thought to be easier to interpret. The downside is that many of the constructs we think of in the real world are in fact correlated.
Quartimax rotation attempts to satisfy the criteria of maximizing the sum of all values in the factor pattern matrix raised to the fourth power:
Here, Pij is the element in ith row and jth column of the factor pattern matrix P (in which variables are represented by rows and factors by columns). The number of variables is denoted by v which is also the number of rows in P. The number of factors is denoted by f, which is also the number of columns in P.
Raising a number to the fourth power has the effect of exaggerating the differences between large and small numbers, so the quartimax criterion will be better met in a factor loading matrix with very large loadings and very small loadings than in one with many moderate-sized loadings. Notably, this rotation simply maximizes this very simple criterion without regard for whether the higher loadings are well distributed among factors or all load onto a single factor.
The varimax rotation subtracts a term summing over squared elements of rows and columns. Its criterion is to find a rotation fitting the following formula:
The effect of this is to favor rotations in which large loadings are distributed over those with large loadings falling on a single (or relatively few) factors.
Varimax is probably the most widely used factor rotation method.
We saw that orthogonal transformations attempt to maximize a transformation of the sums of factor loadings. Oblique rotations do the opposite; they minimize such sums.
Oblimin rotation seeks to minimize the following criteria:
Here, (x,y) represents a pair of variables, and the summation is done for all variable pairs.
Promax is an oblique rotation that starts with varimax and then rotates the varimax to an oblique solution. It takes the factor loadings in the promax solution, raises them to a high power to bring small loadings close to zero, and then attempts a rotation that makes the closest loadings to zero equal to zero.
A package that fits many different rotations has been developed, known as GPA rotation. It is available in languages outside of R as well. Notably, it uses a method that can be applied to almost any rotation criterion, so the package offers functions of not only common rotations but some obscure ones as well. For example:
> library(GPArotation) > rotated.structure <- oblimin(path.coef) > rotated.structure Oblique rotation method Oblimin Quartimin converged. Loadings: [,1] [,2] [,3] [1,] -0.0013 0.06999 3.58e-01 [2,] -0.6804 -0.10073 4.18e-02 [3,] -0.6434 -0.04880 -1.12e-02 [4,] -0.6764 0.06330 -1.20e-01 [5,] -0.4324 0.11336 2.17e-01 [6,] -0.2496 0.06544 4.82e-01 [7,] 0.0446 0.06167 5.50e-01 [8,] -0.2300 0.31230 5.50e-02 [9,] -0.4410 0.34663 -6.06e-02 [10,] -0.4072 0.41199 -1.13e-01 [11,] 0.1783 0.58615 1.36e-01 [12,] -0.0698 0.56071 1.03e-01 [13,] -0.6842 -0.01633 1.11e-01 [14,] -0.4586 0.20304 -6.44e-03 [15,] -0.2744 0.37031 -9.65e-05 [16,] 0.0308 0.60407 7.37e-03 [17,] -0.2598 0.22145 2.99e-01 [18,] -0.1519 0.29464 2.58e-01 [19,] -0.0840 0.38490 2.30e-02 [20,] -0.5786 0.00746 2.10e-01 Rotating matrix: [,1] [,2] [,3] [1,] 0.627 -0.397 -0.202 [2,] 1.017 0.855 0.414 [3,] -0.025 -0.748 1.001 Phi: [,1] [,2] [,3] [1,] 1.000 -0.519 -0.356 [2,] -0.519 1.000 0.374 [3,] -0.356 0.374 1.000
The object produced by this command has a number of important matrices including the new factor loading matrix produced by the rotation, the correlation matrix of the factors, and the rotation matrix (post-multiplication of the original factor pattern matrix with any of the previously mentioned matrices gives the new factor loading matrix).
The question then is how to interpret these factors. The factor loading matrix shows that all 20 observed items load on all three factors (as is typical of EFA), but the loadings are pretty low on some factors. As such, we can interpret what each factor means by those items that load sufficiently heavily on it. What constitutes "sufficiently heavy" loading is far from clear. One of the most commonly used criteria is that the item should have a loading of at least 0.4. However, other criteria exist that require that an item load substantially more on a single factor than any other factor. In general, it is probably best to use some judgment rather than rigid criteria.
Here, we will reprint the loading matrix replacing all those values less than 0.3 with NA
for ease of examination:
> loading.matrix <- rotated.structure$loadings > loading.matrix[ abs(loading.matrix) < 0.3] <- NA > loading.matrix [,1] [,2] [,3] [1,] NA NA 0.3582315 [2,] -0.6804255 NA NA [3,] -0.6434025 NA NA [4,] -0.6763837 NA NA [5,] -0.4324032 NA NA [6,] NA NA 0.4824759 [7,] NA NA 0.5499046 [8,] NA 0.3123046 NA [9,] -0.4410473 0.3466325 NA [10,] -0.4071513 0.4119939 NA [11,] NA 0.5861508 NA [12,] NA 0.5607121 NA [13,] -0.6842129 NA NA [14,] -0.4585589 NA NA [15,] NA 0.3703143 NA [16,] NA 0.6040669 NA [17,] NA NA NA [18,] NA NA NA [19,] NA 0.3848953 NA [20,] -0.5785879 NA NA >
Based on these results, it appears that we have a first factor dealing with gross motor function, a second factor dealing mostly with fine motor function, and a third factor concerned with household management. The two items concerned with recreation (rows 18 and 19) fall on none of these factors. Appropriately, the factor correlation matrix suggests that the factors we interpret as fine and gross motor are more highly correlated with each other than with the household management factor.
We have gone through the basic conceptual and computational ideas underlying EFA in R. As we discussed earlier, to get good estimates, multiple iterations of these computations are needed until some criteria indicating that an optimal solution has been achieved. An excellent package that bundles much of the work we have done earlier into convenient commands is the psych package. We will now go over how to use this package, including calling some of the advanced features that it offers.
We will continue to use the physical functioning dataset here. Our question at hand is whether a few common sources of variance are able to explain the responses to the 20 items. We saw that a three-factor solution is likely most appropriate (or maybe a four-factor solution, but we will stick with three here). For serious exploratory work, it is often ideal to split off a development and validation dataset, but we will skip that step here.
We also do one more thing here. We are working with ordinal data and treating it like it is continuous. Now we will explicitly account for the fact that it is ordinal rather than continuous. We will use polychoric correlations here to create our correlation matrix. The polychoric correlation assumes that the data is ordinal but represents some continuous underlying phenomena that have simply been binned into discrete ordered categories. The polychoric correlation attempts to estimate a correlation with this assumption and calculate the threshold at which the discretization occurs.
Let's start by finding the polychoric correlations:
library(psych) fit.efa.prep <- polychoric(phys.func, polycor = TRUE)
We then take the correlation matrix from these polychoric correlations and place it into our factor analysis:
> fit.efa.3 <- fa(fit.efa.prep$rho, nfac = 3, rotate = 'promax') > fit.efa.3 Factor Analysis using method = minres Call: fa(r = fit.efa.prep$rho, nfactors = 3, rotate = "promax") Standardized loadings (pattern matrix) based upon correlation matrix MR1 MR3 MR2 h2 u2 com PFQ061A -0.13 0.72 -0.09 0.33 0.67 1.1 PFQ061B 0.88 0.09 -0.27 0.62 0.38 1.2 PFQ061C 0.88 0.00 -0.19 0.59 0.41 1.1 PFQ061D 0.96 -0.27 0.03 0.65 0.35 1.2 PFQ061E 0.49 0.22 0.10 0.55 0.45 1.5 PFQ061F 0.35 0.46 0.04 0.62 0.38 1.9 PFQ061G -0.23 0.81 0.10 0.54 0.46 1.2 PFQ061H 0.52 0.31 0.05 0.67 0.33 1.7 PFQ061I 0.64 0.03 0.20 0.65 0.35 1.2 PFQ061J 0.60 0.01 0.28 0.66 0.34 1.4 PFQ061K -0.24 0.08 0.98 0.78 0.22 1.1 PFQ061L 0.17 0.17 0.56 0.67 0.33 1.4 PFQ061M 0.77 0.09 -0.06 0.63 0.37 1.0 PFQ061N 0.58 0.13 0.05 0.51 0.49 1.1 PFQ061O 0.38 0.01 0.40 0.52 0.48 2.0 PFQ061P 0.01 -0.07 0.85 0.65 0.35 1.0 PFQ061Q 0.26 0.69 -0.04 0.74 0.26 1.3 PFQ061R 0.13 0.74 -0.01 0.69 0.31 1.1 PFQ061S 0.12 0.51 0.14 0.49 0.51 1.3 PFQ061T 0.66 0.16 -0.01 0.60 0.40 1.1 MR1 MR3 MR2 SS loadings 6.09 3.55 2.57 Proportion Var 0.30 0.18 0.13 Cumulative Var 0.30 0.48 0.61 Proportion Explained 0.50 0.29 0.21 Cumulative Proportion 0.50 0.79 1.00 With factor correlations of MR1 MR3 MR2 MR1 1.00 0.71 0.67 MR3 0.71 1.00 0.68 MR2 0.67 0.68 1.00 Mean item complexity = 1.3 Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the null model are 190 and the objective function was 15.46 The degrees of freedom for the model are 133 and the objective function was 2.09 The root mean square of the residuals (RMSR) is 0.04 The df corrected root mean square of the residuals is 0.05 Fit based upon off diagonal values = 0.99 Measures of factor score adequacy MR1 MR3 MR2 Correlation of scores with factors 0.97 0.95 0.95 Multiple R square of scores with factors 0.94 0.90 0.90 Minimum correlation of possible factor scores 0.89 0.80 0.81
There are a number of matrices that we can see in the fit.efa.3
object. The first is the factor pattern matrix, which contains the factor loadings. To the right of this are two columns, namely h2
and u2
, the communality and uniqueness estimates for each item respectively. The two values should sum to one. The greater the communality, the more the total variance of an item is explained by the common factors. The next matrix informs us how much of the variance is explained both by the individual factors and by the whole EFA model. Then there is the matrix with the correlations between the common factors, which is not present for orthogonal solutions.
Let's start by looking at the communalities in the h2
column. Low communalities suggest that the latent variables do not explain the data well. The communality is computed as the sum of the squared factor loadings of the unrotated factor solution. They tell us how much of the variance of each observed variable is explained by all factors. The communalities are relatively high with the exception of item A, so we are a little suspicious of how well this item is explained even by all three factors together, but for now we will keep all items.
Let's now look at the loadings. Remember that these loadings indicate how these items relate to some underlying unobserved variables that are causing the observed data. However, we have to try to make sense of what these unobserved factors actually are.
Item A
(concerned with money management) loads heavily on MR3
. Items Q
, R
, and S
also load heavily on this factor. These items appear to be concerned with cognition and social engagement, requiring a few physical demands. Items B
, C
, D
, E
, H
, I
, and T
all load heavily on MR1
. These items require leg use and are concerned with mobility. Items K
, L
, O
, and P
load fairly heavily onto MR2
. These are items that tend to require hand use, suggesting that this is an arm or hand function factor. It is worth pointing out that item O
loads almost as heavily on MR1
, suggesting that some component of mobility is needed to reach overhead. This may be because reaching overhead requires good torso control, which is also needed to walk around and do basic mobility skills. Let me emphasize that this interpretation of the results is made based on a researcher's substantive understanding of the items rather than the statistical analysis alone.
There is some additional information provided, but I will bring your attention to the fit measures in particular. These are a bigger deal in confirmatory factor analysis (discussed in the next chapter), but in summary, these give us some sense of how well the model explains the data. There is more disagreement than agreement on which fit measures for use and how to interpret it. However, three of the commonly used measures are Root Mean Square Residual (RMSR), Root Mean Square Error of Approximation (RMSEA), and the Tucker-Lewis Index (TLI). We would see all three of these if we did not use polychoric correlations, but in this case we see only RMSR. An acceptable fit is usually thought to be indicated by an RMSR less than 0.08, RMSEA less than 0.06, and TLI greater than 0.95 (some would accept greater than 0.90 as adequate).
Now that we have made sense of the factors and the model fit, let's look at the internal consistency reliability. The basic idea of internal consistency is to look at the proportion of variance in scale scores accounted for by the latent variables. In the previous chapter, we touched on this topic discussing how to calculate Cronbach's alpha. Coefficient alpha is the most widely used measure of internal consistency, but for multidimensional scales, McDonald's Omega (of which there are a few) is generally considered better. Let's use psych's omega function to examine internal consistency reliability:
omega(fit.efa.prep$rho, nfac = 3, rotate = 'promax') Omega Call: omega(m = fit.efa.prep$rho, nfactors = 3, rotate = "promax") Alpha: 0.95 G.6: 0.97 Omega Hierarchical: 0.83 Omega H asymptotic: 0.86 Omega Total 0.96
We simply show the beginning of the output from this function, which shows a number of internal consistency reliability coefficients including Cronbach's alpha, Guttman's lambda 6, omega hierarchical, omega asymptotic, and omega total. Cronbach's alpha is the classic split half reliability (discussed in further detail under the applications section of Chapter 5, Linear Algebra). Guttman's lambda 6, while rarely used nowadays, is the squared multiple correlation of the item with the other items.
We will focus on omega hierarchical, omega asymptotic, and omega total here. Omega assumes that a multifactor scale has both specific factors onto which only some items load and a general factor onto which all items load. Omega hierarchical gives us the proportion of variance in scaled scores explained by the general factor. Omega asymptotic is the estimated omega hierarchical for a test with the same structure and infinite length (reliability tends to increase with test length). Omega total is the total reliability of a test including that attributable to both the general factor and the factors onto which not all items load.
Before we look at these reliability coefficients, it may be worth looking back at the results of the fa
function. We see that there are sizeable correlations between the factors, 0.67 to 0.71. If these correlations were low (for example, 0.2), then it would be questionable as to whether omega hierarchical should even be examined given that low correlations between the factors would suggest the non-existence of a general factor that explains scores well.
We can see from these results that omega hierarchical is relatively good (0.83), suggesting that a general factor explains a large proportion of the variance in the scale scores.
18.227.102.50