

Imputation and Latent Class Analysis 

Message/Author 

Jon Heron posted on Friday, July 09, 2010  3:23 am



Hi Bengt/Linda, I'm carrying out an unconditional LCA, exporting the posterior probs to use as a weighting in my second stage covariate analysis (to prevent the covariates affecting the class derivation). I have missingness in my covariates as well as my dependent variables and I'm comparing the results of two ways of dealing with missingness. 1] ICE then LCA Impute multiple datasets (using ICE in Stata) and fit the LCA model to each one in turn. I am not allowing Mplus to pool the LCA results (Rubinstyle) due to earlier problems with this (already discussed with Bengt in another post). 2] LCA then ICE Fit one LCA model in Mplus with FIML and export those results, using the single set of posterior probs as a weighting in my ICE imputation. As one might have predicted, the parameter estimates in each approach are similar as I used 100 imputations (I have been uncharacteristically patient), but the SE's for approach [1] are considerably higher to reflect the variability in LCA results between each dataset. Despite the fact this outcome is not surprising, I am still at a bit of a loss as to how to finish this up. Is [1] superior as I have maintained the uncertainly that approach [2] would lose when manifesting my latent measure, or do they both have their shortcomings. Any help gratefully recieved best wishes, Jon 


You can do the imputations in Mplus using the SEQUENTIAL method (Ragunathan's method) without having to go outside Mplus to ICE in Stata. Regarding 1], I am not clear on how you combine the results to compute the SEs  not by Rubin's formula you say, so do you just take the average SE? Regarding 2], I think this only works well when the entropy is high (say > 0.8). Another imputationoriented method is to use the LCA indicators only and create plausible values for the latent classes using the new Version 6 option for that. Then do multiple regressions of class on covariates. Yet another approach is to first estimate the model with covariates influencing class membership, using ML, and include the covariates in the model so that missing data on them is handled by MAR. Then use SVALUES to carry those estimates over as fixed starting values for a Bayes analysis that simply generates the missing data. Then analyze the model you want. As an aside, when you impute for the covariates, perhaps you don't want to use the LCA indicators given that in the modeling you don't want covariates to affect class formation. 

Jon Heron posted on Monday, July 12, 2010  1:39 am



Thanks Bengt Re [1] sorry for not being clear, I don't pool the results in Mplus, I use MIM to pool them in Stata. I meant that it is my covariate*latentclass parameter estimates which are being pooled, rather than the latentclass posterior probs. I expect if I were to pool the results of the LCA prior to my multinomial model, then SE's for the covariate effects would be reduced and be more similar to those obtained in [2]. If I were to chose this option, what would I average over the imputations  the pattern specific assignment probabilities? Re [2] My entropy is pretty good ~ 0.85. With this approach I do drop the latent class indicators prior to the imputation step. It seems I am spoilt for choice here thanks to Mplus 6. Unfortunately I am under some pressure now to finish up and move on 

Rachel Ellis posted on Wednesday, January 15, 2014  6:22 pm



Hello, I'm running a longitudinal growth mixture model, using FIML to deal with missing data. I've also used multiple imputation on auxiliary variables (that I've analysed with DCON in version 7.11) to get their means and S.E.'s. Some articles have suggested that data imputation isn't appropriate with LCGA/GMM, because they assume multiple subpopulations whereas imputation assumes a single population (e.g. Costello, D. M., Swendsen, J., Rose, J. S., & Dierker, L. C. (2008). Risk and protective factors associated with trajectories of depressed mood from adolescence to early adulthood. J Consult Clin Psychol, 76(2), 173183. doi: 10.1037/0022006X.76.2.173). Just wondering if anyone has any thoughts on this? Thanks 


This may be the case. You can, however, impute the data according to an H0 model. See page 516 of the user's guide where the various options for imputing data are shown. 

xybi2006 posted on Tuesday, January 13, 2015  4:24 pm



Dear Dr. Muthen, I ran a latent class analysis with two covariates using "auxiliary". In addition to the two covariates, I also wanted to take into account the correlation between the two covariates. Then, how should I ran the latent class analysis? For growth curve, we can use medcon with dep, but it does not work for the latent class analysis using auxiliary. auxiliary = MEDCON (R) dep(R); analysis: type=mixture; starts = 1100 250; stiterations = 20; ALGORITHM=INTEGRATION; model: %overall% %C#1% %C#2% %C#3% 


The covariates are allowed to correlate  the correlation parameter is just not estimated since it is not part of the model (just like in regular regression). 

Back to top 

