Let's delve into another example using PCA that will be instructive when compared to factor analysis later on in the Exploratory factor analysis and reflective constructs section. We will return to the physical functioning dataset used earlier in this book. Here we will discuss the notions of formative and reflective constructs. See the following diagram (paying close attention to the arrow directions) for a visual representation of the differences. A formative construct is one in which a general trait is composed of a number of very specific traits as shown in the diagram. The arrows pointing towards the construct indicate that the construct is derived from the traits.
Alternately, a reflexive construct is one in which a general trait is thought to underlie and cause specific traits, as shown in the following diagram. The arrows pointing away from the construct are designed to reflect the fact that the construct drives the traits, and the specific traits are merely manifestations of this construct:
PCA is often considered a method of modeling formative constructs. Later on in this chapter, we will discuss factor analysis, which models reflective constructs.
The physical functioning dataset collects data on the ability of individuals to engage in 20 ADLs and IADLs. Taken together, we would expect that the way a person scores on this would be some sort of a measure of functional independence with ADLs and IADLs. Unfortunately, reporting the scores for all 20 items for each individual is tedious, so we wish to use some sort of a summed score or multiple summed scores to summarize a person's functional status. The question is whether this can legitimately be done. Does it make sense to add standing to walking? There is one more question: What are we trying to achieve by creating summary scores? The answer to this question lies in what we seek to measure with this scale. If we assume that a person should be able to do these 20 things independently to be truly independent, then we are assuming that functional independence is in some sense defined by the abilities to perform these 20 items, and we have a formative construct. Alternately, if we assume that the abilities to perform these 20 items simply serve as manifestations of an underlying trait of functional independence, then we have a reflective construct in mind. Statistically, this is the difference between a fixed effects and random effects model, respectively.
We will start here with the idea that we are attempting to use these 20 items to model a formative construct, and we will use PCA for this. We will apply PCA on the physical functioning dataset and look at the variance explained in the PCA:
> phys.func <- read.csv('phys_func.txt')[,c(-1)] > phys.func.pca <- PCA(phys.func) > summary(phys.func.pca) Call: PCA(phys.func) Eigenvalues Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Variance 6.423 1.574 1.286 1.094 0.988 % of var. 32.113 7.869 6.428 5.470 4.939 Cumulative % of var. 32.113 39.983 46.410 51.880 56.818 Dim.6 Dim.7 Dim.8 Dim.9 Dim.10 Variance 0.865 0.800 0.738 0.699 0.657 % of var. 4.323 4.001 3.689 3.494 3.287 Cumulative % of var. 61.142 65.143 68.832 72.326 75.613 Dim.11 Dim.12 Dim.13 Dim.14 Dim.15 Variance 0.643 0.560 0.523 0.510 0.485 % of var. 3.213 2.802 2.613 2.552 2.425 Cumulative % of var. 78.826 81.628 84.241 86.793 89.218 Dim.16 Dim.17 Dim.18 Dim.19 Dim.20 Variance 0.467 0.454 0.440 0.416 0.380 % of var. 2.333 2.271 2.200 2.080 1.899 Cumulative % of var. 91.551 93.821 96.021 98.101 100.000
We can see that the first component explains more than four times as much of the variance as any other single component, and four components are needed to explain the majority of the variance. This suggests that there may be a more simple summary interpretation than inspecting all 20 variables for each subject.
We can also plot the results of the scree plot as follows:
plot(phys.func.pca$eig$eigenvalue, type = 'b', xlab = 'Principal Component', ylab = 'Eigenvalue', main = 'Eigenvalues of Principal Components')
The result is as shown in the following scree plot:
The first question is whether we should simply treat physical functioning as a uni-dimensional scale (that is, reduce 20 dimensions to one) or whether we should treat physical functioning as being multidimensional. Based on the Kaiser-Guttman rule, we should retain four components. Based on the screenshot, it is less clear how many components are worth retaining. Looking very closely at the preceding graph, it appears that there are two plateaus: one that starts after the third component and another that starts after the sixth component. Thus, we should probably retain three components based on scree criteria. However, there is also the problem of the interpretation of the components. Let's try to make sense of a three to four component model by looking at the squared cosines. Remember that these reflect how well a variable is projected onto an axis, so if it is not well projected, then it is not helpful to use that axis to measure the variable:
> phys.func.cos <- phys.func.pca$var$cos2 > phys.func.cos[ phys.func.cos < 0.2 ] <- NA > phys.func.cos Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 PFQ061A NA NA 0.2180148 NA NA PFQ061B 0.3425603 NA NA NA NA PFQ061C 0.3197294 NA NA NA NA PFQ061D 0.3846285 NA NA NA NA PFQ061E 0.4029381 NA NA NA NA PFQ061F 0.3407052 NA NA NA NA PFQ061G NA NA 0.3237579 NA NA PFQ061H 0.2748247 NA NA NA 0.2984617 PFQ061I 0.4465091 NA NA NA NA PFQ061J 0.4352429 NA NA NA NA PFQ061K NA 0.2953024 NA NA NA PFQ061L 0.3623138 NA NA NA NA PFQ061M 0.4776348 NA NA NA NA PFQ061N 0.3667740 NA NA NA NA PFQ061O 0.3354625 NA NA NA NA PFQ061P 0.2383867 NA NA NA NA PFQ061Q 0.3746035 NA NA NA NA PFQ061R 0.3054978 NA NA NA NA PFQ061S NA NA NA 0.2773378 NA PFQ061T 0.4540454 NA NA NA NA
We have set a relatively low threshold of 0.2 for a squared cosine, and we see that most of the variables meet this criterion for the first component, but most fail to meet this criterion for subsequent components. All items meet this criterion for one component when we include four components, but two components have only a single item. As we can see, the items that have a low projection on the first component do not really relate to mobility, whereas those that meet our 0.2 criterion do. Based on this data, we may simply want to consider this outcome measure as being composed of a single dimension that is concerned with physical mobility and exclude the five items from our scoring of the test that do not project well onto this dimension.
In the following sections, we will revisit this as a reflective construct yielding more interpretable results.
What kind of commonly encountered constructs are regarded as formative?
The analysis that we did begins to touch on concepts of psychometrics, which is a field that rarely models formative constructs since it is usually concerned with observable manifestations of invisible psychological processes (for example, reflective constructs). Socio-economic status is one of the few well accepted formative constructs in the psychological and social science canon.
18.219.71.21