Message/Author 


I am carrying out latent class analysis using 5 multiply imputed datasets. I have managed to carry out the analysis using type=imputation I have then run the analysis with all the parameters fixed to values of the average estimates from the mulitple imputation analysis to obtain latent class probablities. This gives me the class probablities. My question is how should I assess the model fit. Because the results are the average across the 5 datasets, the usual criteria I would use are not available. Thank you Jessica 


You can use "Model Test" for specific hypothesis. You can also use tech10 in the fixed values run. 


Thank you for your reply. Could you maybe give me a little more detail about what exactly I should be looking for in the tech10 output? Many thanks for your help 


Sorry, I have another question. How can I find out the results of the bootstrap likelihood ratio test for the different models I run? Using tech14 when I run the analysis using the fixed values, on any of the imputed datasets, is different depending on the dataset I use. Does this matter? The significance level of the models seems to remain the same for the different datasets. 

Jon Heron posted on Wednesday, April 15, 2015  11:47 pm



Hi Jessica, there are many unanswered questions when it comes to MI in mixture modelling. It sounds like you are attempting to determine the number of classes across your imputed datasets. My suggestion would be that you make this decision first, prior to analysing the imputed data  perhaps relying on likelihoodbased methods to deal with classindicator missingness at that point, alternatively using some theory. By the way, your imputation model should contain information regarding the classes  so this is a little chicken and egg. The first mention I have seen on this is Colder CR, Mehta P, Balanda K, et al. Identifying trajectories of adolescent smoking: an application of latent growth mixture modeling. Health Psychol 2001/3; 20(2): 12735 You'll find that some people  including me  have neglected to do this, and assumed a single population for the purpose of imputation. This is likely to attenuate any differences between the classes. 


Thanks for your reply. If I understand correctly you suggest running the LCA analysis on the data with missing values and then using multiple imputation to impute class membership? 

Jon Heron posted on Thursday, April 16, 2015  12:59 am



Hi Jessica no, I didn't mean that. I'll try again now I am caffeinated. You don't need imputation to deal with indicators of class membership so presumably you are using imputation because you have missing covariate information. My suggestion is that you determine the number of classes before considering imputation and then you use imputation to deal with covariate missingness. After doing imputation you will need to carry out another LCA but at this point you wont need to worry about BLRT/BIC etc. You'll see a number of adhoc alternatives in the literature. For instance, if your entropy is excellent (>0.9?) then exporting your LCA results, assigning to modal class and then doing imputation *might* be adequate. Hard to say whether any bias from doing that would be smaller/greater than imputing whilst ignoring the underlying class structure. 


Thanks again Jon. I'm probably being really dumb, but why don't I need inputation to deal with indicators of class membership? If some of the variables I am using to determine my classes having missing values should I not deal with these before using them to determine my classes? 

Jon Heron posted on Thursday, April 16, 2015  2:37 am



Hi Jessica missing data in class indicators is typically handled using likelihood methods (FIML). Check out some of Craig Enders' work including Enders and Bandalos. Of course there are assumptions to make, but there always are with missing data. Enders and Bandalos (2001): http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1065&context=edpsychpapers 


Thanks very much for all your help. I have more reading to do I think! 

Jon Heron posted on Thursday, April 16, 2015  3:13 am



No probs 


Tech10 has univariate and bivariate fit information which you can use to asses model fit. For determining the number of classes, my preferred method is to run tech14 on each imputed data set (with all parameters free). Once you determine the number of classes for each imputed data set  use the # of classes that comes up the most across the imputed data sets. Just to confirm Jon Heron's reply ... if you have the raw data with the missing values you can run LCA with the ML estimator directly without doing imputations and avoid the above complications. 


Ok. That's great. Thank you for your help. 


We are running a Latent Profile Analysis with 4 indicators in a sample of 371 participants. We identified a 3 –class solution and would like to test which covariates predict class membership (using the Vermunt 3 steps approach). As we have quite some missing in our dataset, we would like to make multiple imputation of our covariates. However, we read that multiple imputation in the context of latent class analysis is not recommended, because it hypothesizes multiple latent subpopulations while multiple imputation suppose a single population. Our questions are thus the following: 1) Does this issue also apply for covariate or is it more for the imputation of the latent class indicators? 2) Would this issue be somewhat addressed if the imputation of the covariates is made based on – among other variables – the probabilities of each class membership (previously identified)? 3) We first identified the classes using FIML and would consider multiple imputation only for the covariate: Is this combination of different ways to handle missing data appropriate? Thank you in advance for your help. Any thoughts would be welcome. 


1) It does apply for covariates as well 2) Generally the more information is supplied for the imputation the better 3) Yes  it is appropriate and it is preferable because the missing data for the indicators can easily be handled in stage one with FIML. You can also consider these two approaches: A. Treat the covariate as a dependent variable during stage 3 estimation. You would need to use the manual 3 step approach for that  see Section 3 in http://statmodel.com/download/webnotes/webnote15.pdf In the third stage you would need to mention the variance of the covariate. This avoids imputations completely and uses FIML treatment of the missing data. B. The imputation of the covariates can be done using H0 imputations in Mplus. The covariate is treated here again as a dependent variable and the imputation is based on the mixture model itself. See User's Guide example 11.7 for an example of H0 imputation. 


Thank you very much for this quick answer. Two things are not totally clear for me: 1) If I understood well, you advise us to enter our covariates as distal outcomes in the manual 3 steps approach? 2) For the A approach, you say to "mention the variance of the covariate" in the 3rd step, but do you mean estimating the variance, or fixing them, or something else? and should it be done in the overall model or in the specific class models? Thank you for your help! 


1) No. Treat it as a class predictor in a bivariate model C on X; X; so that missing data is treated with FIML 2) Estimating the variance. It should be in the overll section. Mplus will not allow you to do it in the class specific section since that creates reciprocal interaction, see Maddala reference in http://statmodel.com/download/Rec.pdf 


Thank you very much for this very clear answer. It works well and the model is estimated with FIML. However, we noticed that our number of individuals per class changed slightly compared to the step 1 model. Is it normal? Can it be avoided? 


Slight changes are normal. 


Dear Tihomir, If I understood it well, with the H0 imputation model one would: 1. Use the LPA identified as the best fitting solution to impute the missing values of the covariates 2. The imputed values would be used in a second analysis with the 3step approach to test which covariates predict class membership. Is this correct? If so, how would the covariates be specified in the imputation model? I tried the following syntax, but it didn’t work (*** ERROR in DATA IMPUTATION command Unknown variable(s) in the IMPUTE option: x1 ). VARIABLE: NAMES = id u1u4 x1x15; USEVARIABLES = u1u4; MISSING = ALL (999); IDVARIABLE = id; CLASSES = c(4); ANALYSIS: ESTIMATOR = BAYES; PROCESSORS = 2; TYPE = MIXTURE; MODEL: %OVERALL% x1x9; !I mention the variance of the predictors with missing data %C#1% u1; u2; u3; %C#2% u1; u2; u3; %C#3% u1; u2; u3; %C#4% u1; u2; u3; DATA IMPUTATION: IMPUTE = x1x10 NDATASETS = 20; SAVE = imputation*.dat; SAVEDATA: FILE = plausvalues.dat; I assume this error appears because I am not mentioning my covariates in the USEVARIABLES command, but if I do it, the covariates would be used as indicators of the LPA. What would then be the best way to set up the imputation syntax? I appreciate a lot your help. 


I think so ... add the variables x1x15 to the usevar command and make these class indicators and add x1x15 with x1x15 and the covariances with U. Such a model could be hard to estimate, however, so I would consider simply using H1 imputation using type=basic; 

Mayra Galvis posted on Monday, September 17, 2018  11:34 pm



Dear Tihomir Thank you very much for your answer. In case of specifying the model the way you suggested, is there a high risk of getting a LPA solution that is significantly different from the original one we identified as the best fitting? or would there be a way to “protect” the classes we previously identified? 


I think you are correct. I think some kind of twostage imputation can work to protect the LPA solution. First impute the latent class variable (plausible value) from the LPA and then for each such imputation impute the covariates (treating the imputed latent class value as knownclass) and data set at a time. 

Back to top 