

Imputation and Latent Class Analysis 

Message/Author 

Jon Heron posted on Friday, July 09, 2010  3:23 am



Hi Bengt/Linda, I'm carrying out an unconditional LCA, exporting the posterior probs to use as a weighting in my second stage covariate analysis (to prevent the covariates affecting the class derivation). I have missingness in my covariates as well as my dependent variables and I'm comparing the results of two ways of dealing with missingness. 1] ICE then LCA Impute multiple datasets (using ICE in Stata) and fit the LCA model to each one in turn. I am not allowing Mplus to pool the LCA results (Rubinstyle) due to earlier problems with this (already discussed with Bengt in another post). 2] LCA then ICE Fit one LCA model in Mplus with FIML and export those results, using the single set of posterior probs as a weighting in my ICE imputation. As one might have predicted, the parameter estimates in each approach are similar as I used 100 imputations (I have been uncharacteristically patient), but the SE's for approach [1] are considerably higher to reflect the variability in LCA results between each dataset. Despite the fact this outcome is not surprising, I am still at a bit of a loss as to how to finish this up. Is [1] superior as I have maintained the uncertainly that approach [2] would lose when manifesting my latent measure, or do they both have their shortcomings. Any help gratefully recieved best wishes, Jon 


You can do the imputations in Mplus using the SEQUENTIAL method (Ragunathan's method) without having to go outside Mplus to ICE in Stata. Regarding 1], I am not clear on how you combine the results to compute the SEs  not by Rubin's formula you say, so do you just take the average SE? Regarding 2], I think this only works well when the entropy is high (say > 0.8). Another imputationoriented method is to use the LCA indicators only and create plausible values for the latent classes using the new Version 6 option for that. Then do multiple regressions of class on covariates. Yet another approach is to first estimate the model with covariates influencing class membership, using ML, and include the covariates in the model so that missing data on them is handled by MAR. Then use SVALUES to carry those estimates over as fixed starting values for a Bayes analysis that simply generates the missing data. Then analyze the model you want. As an aside, when you impute for the covariates, perhaps you don't want to use the LCA indicators given that in the modeling you don't want covariates to affect class formation. 

Jon Heron posted on Monday, July 12, 2010  1:39 am



Thanks Bengt Re [1] sorry for not being clear, I don't pool the results in Mplus, I use MIM to pool them in Stata. I meant that it is my covariate*latentclass parameter estimates which are being pooled, rather than the latentclass posterior probs. I expect if I were to pool the results of the LCA prior to my multinomial model, then SE's for the covariate effects would be reduced and be more similar to those obtained in [2]. If I were to chose this option, what would I average over the imputations  the pattern specific assignment probabilities? Re [2] My entropy is pretty good ~ 0.85. With this approach I do drop the latent class indicators prior to the imputation step. It seems I am spoilt for choice here thanks to Mplus 6. Unfortunately I am under some pressure now to finish up and move on 

Rachel Ellis posted on Wednesday, January 15, 2014  6:22 pm



Hello, I'm running a longitudinal growth mixture model, using FIML to deal with missing data. I've also used multiple imputation on auxiliary variables (that I've analysed with DCON in version 7.11) to get their means and S.E.'s. Some articles have suggested that data imputation isn't appropriate with LCGA/GMM, because they assume multiple subpopulations whereas imputation assumes a single population (e.g. Costello, D. M., Swendsen, J., Rose, J. S., & Dierker, L. C. (2008). Risk and protective factors associated with trajectories of depressed mood from adolescence to early adulthood. J Consult Clin Psychol, 76(2), 173183. doi: 10.1037/0022006X.76.2.173). Just wondering if anyone has any thoughts on this? Thanks 


This may be the case. You can, however, impute the data according to an H0 model. See page 516 of the user's guide where the various options for imputing data are shown. 

xybi2006 posted on Tuesday, January 13, 2015  4:24 pm



Dear Dr. Muthen, I ran a latent class analysis with two covariates using "auxiliary". In addition to the two covariates, I also wanted to take into account the correlation between the two covariates. Then, how should I ran the latent class analysis? For growth curve, we can use medcon with dep, but it does not work for the latent class analysis using auxiliary. auxiliary = MEDCON (R) dep(R); analysis: type=mixture; starts = 1100 250; stiterations = 20; ALGORITHM=INTEGRATION; model: %overall% %C#1% %C#2% %C#3% 


The covariates are allowed to correlate  the correlation parameter is just not estimated since it is not part of the model (just like in regular regression). 

John Woo posted on Thursday, February 16, 2017  6:42 am



Hi, if I am running LCA or LPA with five indicators y1, y2, y3, y4, y5 that have missing values, is it possible to export the FIMLimputed values for the indicators (y1... y5) using savedata? The usual savedata text file only shows the raw y1...y5. *as a side, if I am doing an auxiliary analysis using the indicator variables outside Mplus (e.g., doing some regression with y5) and if this analysis is discussed in relation to the LCA results, is it reasonable to use the Mplusprovided imputation for y5 (that was used in producing LCA)? As opposed to, doing separate imputation for y5 (like multiple imputation using ice in STATA)? Thank you in advance for your response. 

Jon Heron posted on Thursday, February 16, 2017  9:09 am



Hi John I think you need to go down the Bayesian Imputation route e.g. see example 11.7 (which is as close as I can find to your situation) This procedure will obviously create multiple copies (plausible values) for anything that is not complete. Since this is a kind of imputation you would need to incorporate the auxiliary variables you mention into the imputation step. This may lead you to decide that your plans for a multistep approach are a little pointless depending on what you were planning on doing back in Stata. 

Back to top 

