

Feasibility of a Latent Class Analysis 

Message/Author 


I want to conduct a LCA on three surveys (three different years), with covariates and sample weights. Each survey samples workers (between 3.300 and 5.200 each, depending on the years) in different industries and firm sizes. The class indicators are 14 items measuring up to what extent different organizational and physical aspects (i.e. monotony, lack of autonomy, physical effort, noise, insecurity, …) cause annoyances to the workers, measured on a 5point likert scale (from not at all to very much). Data is highly skewed towards the first category so, to have less sparse tables, I’ve dichotomized the items. A) Is that correct? To have an idea of the possible model and number of classes, an EFA was requested on data from the last survey, showing 2 eigenvalues greater than one (7.687 and 1.441; the third was 0.814). Nevertheless, it seems impossible to fit an EFA in terms of the Chisq. with the WLSMV estimator (for the 2factor solution, RMSEA=0.042 and RMSR=0.0457 but Chi[50]=517.425; adding more factors leads to negative residual variances). B) It seems that the data is not well suited for factor analysis, should it also be concluded that the data is not well suited for LCA? Is there anyway to cope with this problem? I’ve read a reference where the authors do a LPA over principal components, but I don’t see support for this idea in this forum. Thanking you in advance, Fernando. 


A) Dichotomizing very skewed likert items is not bad. B) Regarding EFA, I would look at the interpretability of 1, 2, etc factor solutions. But a clean EFA solution is not a prerequisite for a successful LCA. 


The 2 and 3 factor solutions are interpretable. I also like the 3 and (even more) 4 classes solutions, but some of the bivariate residuals (BVR) are high, and Pearson's Chi is significative. Fit indices for 1 to 7 classes are: C's BIC LL Param. Entrophy LRT (C1) 1 90080.061 44980.125 14   2 91067.610 45396.792 32 0.861 0.0000 3 74049.313 36836.381 44 0.799 0.0000 4 72894.265 36194.673 59 0.783 0.0004 5 72505.335 35936.024 74 0.763 0.1368 6 72282.717 35760.530 89 0.754 0.1200 7 72118.025 35613.999 104 0.772 1.0000 Can I accept the 4classes solution and then go with the covariates? 


From the face of it, 4 classes looks reasonable. For decisions on how many classes to use, including the use of the Version 4.1 bootstrapped LRT, see the new paper on our web site: http://www.statmodel.com/download/LCA_tech11_nylund_v83.pdf Also, for decisions on whether to use a factor analysis model or a latent class analysis model by comparing log likelihoods and Tech10, see the new paper on our web site: http://www.statmodel.com/download/Muthen_tobacoo_2006.pdf 


Thank you very much for your quick and helpful answer. I didn't use the bootstrapped LRT because I have sample weights. I didn't knew the FMA technique, it seems that is exactly what is needed, and I will explore it as soon as possible. The tobaco paper is quite clarifying, and I will work on it, and in example 7.27. Many thanks, Fernando. 

Daniel Lee posted on Sunday, February 08, 2015  6:41 pm



Hi Dr. Muthen, I was wondering if it was possible to conduct an LCA on a 4 item scale. Are there any implications I should be aware of for conducting an LCA on only 4 items? Second, I was wondering if it was possible to examine whether latent classes can be included in a multiple process latent curve model to predict the longitudinal relation (i.e., relationship between intercepts and slopes) between two timevarying variables (e.g., stress and depressive symptoms). If so, can you direct me to any resources that might be helpful for constructing such a model? For example, if I found 2 latent classes in the 4item scale, can I examine whether each class uniquely predicts the onset and trajectory of depressive symptoms after taking into account the effect of stress (the second timevarying)? Thank you so much. Your answers are always so insightful. Dan 


Yes, this is possible. The number of classes you can extract will be limited. A twoprocess growth mixture model is possible. You would need to adapt Example 6.13. 

Back to top 

