Latent class models

Latent Class Analysis (LCA) is a method for identifying latent variables among polychromous outcome variables. It is similar to factor analysis, but can be used with discrete/categorical data. To this end, LCA is mostly used when analyzing surveys.

In this section, we are going to use the poLCA function from the poLCA package. It uses expectation-maximization and Newton-Raphson algorithms for finding the maximum likelihood for the parameters.

The poLCA function requires the data to be coded as integers starting from one or as a factor, otherwise it will produce an error message. To this end, let's transform some of the variables in the mtcars dataset to factors:

> factors <- c('cyl', 'vs', 'am', 'carb', 'gear')
> mtcars[, factors] <- lapply(mtcars[, factors], factor)

Tip

The preceding command will overwrite the mtcars dataset in your current R session. To revert to the original dataset for other examples, please delete this updated dataset from the session by rm(mtcars) if needed.

Latent Class Analysis

Now that the data is in an appropriate format, we can conduct the LCA. The related function comes with a number of important arguments:

  • First, we have to define a formula that describes the model. Depending on the formula, we can define LCA (similar to clustering but with discrete variables) or Latent Class Regression (LCR) model.
  • The nclass argument specifies the number of latent classes assumed in the model, which is 2 by default. Based on the previous examples in this chapter, we will override this to 3.
  • We can use the maxiter, tol, probs.start, and nrep parameters to fine-tune the model.
  • The graphs argument can display or suppress the parameter estimates.

Let's start with basic LCA of three latent classes defined by all the available discrete variables:

> library(poLCA)
> p <- poLCA(cbind(cyl, vs, am, carb, gear) ~ 1,
+   data = mtcars, graphs = TRUE, nclass = 3)

The first part of the output (which can be also accessed via the probs element of the preceding saved poLCA list) summarizes the probabilities of the outcome variables by each latent class:

> p$probs
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$cyl
               4      6 8
class 1:  0.3333 0.6667 0
class 2:  0.6667 0.3333 0
class 3:  0.0000 0.0000 1

$vs
               0      1
class 1:  0.0000 1.0000
class 2:  0.2667 0.7333
class 3:  1.0000 0.0000

$am
               0      1
class 1:  1.0000 0.0000
class 2:  0.2667 0.7333
class 3:  0.8571 0.1429

$carb
               1      2      3      4      6      8
class 1:  1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
class 2:  0.2667 0.4000 0.0000 0.2667 0.0667 0.0000
class 3:  0.0000 0.2857 0.2143 0.4286 0.0000 0.0714
$gear
               3   4      5
class 1:  1.0000 0.0 0.0000
class 2:  0.0000 0.8 0.2000
class 3:  0.8571 0.0 0.1429

From these probabilities, we can see that all 8-cylinder cars belong to the third class, the first one only includes cars with automatic transmission, one carburetor, three gears, and so on. The exact same values can be plotted as well by setting the graph parameter to TRUE in the function call, or by calling the plot function directly afterwards:

Latent Class Analysis

The plot is also useful in highlighting that the first latent class includes only a few elements compared to the other classes (also known as "Estimated class population shares"):

> p$P
[1] 0.09375 0.46875 0.43750

The poLCA object can also reveal a bunch of other important information about the results. Just to name a few, let's see the named list parts of the object, which can be extracted via the standard $ operator:

  • The predclass returns the most likely class memberships
  • On the other hand, the posterior element is a matrix containing the class membership probabilities of each case
  • The Akaike Information Criterion (aic), Bayesian Information Criterion (bic), deviance (Gsq), and Chisq values represent different measures of goodness of fit

LCR models

On the other hand, the LCR model is a supervised method, where we are not mainly interested in the latent variables explaining our observations at the exploratory data analysis scale, but instead we are using training data from which one or more covariates predict the probability of the latent class membership.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.152.136