LCA and Multiple Imputed Datasets PreviousNext
Mplus Discussion > Missing Data Modeling >
 Jon Heron posted on Monday, July 17, 2006 - 7:05 am

I've just fitted a 5-class LCA model on 10 imputed datasets.

The results are nonsense.

I think that the reason for this is that

"Parameter estimates are averaged over the set of analyses"

means that the resulting parameters for class-1 are the average of the parameters for the class-1's for each of the 10 datasets. Hence, if the ordering of the 5-classes is different within each dataset then everything gets jumbled.

If so, I'm wondering how to get round this. Are the same random starts used within each dataset?

many thanks

 Linda K. Muthen posted on Monday, July 17, 2006 - 8:39 am
To avoid label switching, use user-specified starting values. You can take these from the analysis of one of the imputed data sets.
 Jon Heron posted on Tuesday, July 18, 2006 - 12:11 am

thanks Linda

 Stephen Gilman posted on Friday, June 13, 2008 - 10:08 pm
I wonder if there is any update to the above postings regarding the ability to either save (or even view) the class probabilities for an LCA model with multiply imputed datasets. If it's not possible to save the class probabilities, then I am not sure how to determine which observations should be classified into which class.
Thanks very much for your help,
 Linda K. Muthen posted on Saturday, June 14, 2008 - 4:16 pm
It would not make sense to save the average of the class probabilities. What you should do is run the analysis with all of the parameters fixed to the values of the average estimates from the multiple imputation analysis and obtain latent class probabilities from that model. Use one data set. It doesn't matter which one because all of the parameters are fixed.
 GP posted on Tuesday, January 12, 2010 - 11:30 am
Dear Linda,

I was looking for a way to save the class probabilities for an LCA model with multiply imputed datasets when I came across the postings above. I understand your answer from 6/14/08 in principle, but I don't know how to translate it into practice. My model has three classes, and I’m running Mplus 5.21.

When you say “all the parameters fixed to the average estimates” are you talking about the mean intercept and slope for each latent class? Do you have any code you can share?

Thank you in advance for your help.
 Linda K. Muthen posted on Tuesday, January 12, 2010 - 4:22 pm
If the model run with multiple imputations has 10 model parameters, then these 10 parameters must be fixed in the MODEL command to the values estimated under multiple imputation. Then run the analysis as a regular analysis with any one of the imputed data sets and ask for CPROBABILITIES in the SAVEDATA command.
 GP posted on Wednesday, January 13, 2010 - 9:58 am
Thank you for your reply; as you can tell I'm pretty new to Mplus and still confused. This is the code for the analysis with the imputation:

File is C:\MPLUS\imp.dat;

Names are
id dv0 dv1 dv2;
Missing are all (-9);
Idvariable is id;
Classes = C(2);

Type = mixture;
Starts 5000 100;

i s | dv0@0 dv1@1 dv2@2;

tech9 tech11;

Results is C:\MPLUS\imp.dat;

If I understand correctly, I should rerun the model on one of the datasets adding something to the MODEL statement to set the values of dv0 dv1 and dv2 to the estimates obtained by imputation. The imputation output reports an estimated intercept and slope for each of the dv's, separateley for class 1 and class 2. Are these the values I should use for the revised MODEL statement? Can you please show me how the code would look like?

Thank you again.
 Linda K. Muthen posted on Wednesday, January 13, 2010 - 2:58 pm
Please send the full output from the multiple imputation analysis and your license number to
 Sharon Ghazarian posted on Thursday, March 25, 2010 - 8:48 am
Can the runall utility be used for lca/lta analyses to enable the use of multiple imputation datasets instead of having to fix model parameters?

 Linda K. Muthen posted on Thursday, March 25, 2010 - 11:19 am
I don't understand your question. If you have multiple imputation data sets, you should use TYPE=IMPUTATION in the DATA command.
 Sharon Ghazarian posted on Thursday, March 25, 2010 - 11:44 am
I am using type=mixture for my analyses and so I get an error message when I use type=imputation as well. The error says that imputation cannot be used with type=mixture.
 Linda K. Muthen posted on Thursday, March 25, 2010 - 12:33 pm
Please send your full output and license number to IMPUTATION and MIXTURE can be used together.
 Jon Heron posted on Wednesday, May 19, 2010 - 4:33 am
Hi there,

I've returned to the task analysing a latent class model across a number of imputed datasets (created using ICE in Stata).

I see I started this thread nearly 4 years ago - wish i'd sorted this problem out then :-(

Anyway, I have 20 datasets. I've fitted the 4-class model to the first one and then used the resulting parameters as starting values for all the parameters in the combined analysis with type=imputation. Despite the starting values, the ordering of classes between datasets changes so that the result bears no resemblance to those obtained were I to analyse each dataset in succession.

As I plan to do my covariate analysis in Stata, I could probably make do without the results of the model averaged over my 20 datasets, provided I can get hold of the posterior probabilities for each of my 20 models - I expect I can then use these in Stata within the MIM routine. I see that savedata does not work with imputation, so do I have any alternative to fitting the same model to each dataset myself and combining the outputted results?

many thanks, Jon
 Bengt O. Muthen posted on Wednesday, May 19, 2010 - 4:44 pm
In Version 6 you can do ICE within Mplus using the Sequential imputation option (see the V6 UG pp. 463-464). Also, V6 Type=Imputation now has the default of automatically using starting values from the first imputation analysis for subsequent imputations.

The class switching would seem to happen if you have substantial amounts of missingness and/or a couple of classes that are really close.

Since you want to do a "covariate analysis" at the end it sounds like you plan to use posterior probs (or most likely class) as DVs, in which case it might be better to use the V6 approach of Plausible values for the latent classes. This is a research topic, however.

By the way, you might also find the new V6 OUTPUT option SVALUES useful.
 Jon Heron posted on Thursday, May 20, 2010 - 1:10 am
Thanks Bengt,

my plan was to do ICE within Stata and them compare the results with those obtained through ICE + Mplus

I do have substantial missingness - 3,000 complete cases, another 4,500 with partial missing data, however even the largest class representing 80% of the sample can occur as either class 1, 2, 3 or 4 when I use the starting values from dataset 1 and apply them to my other 19 datasets individually.

I've now written some Stata code to impute using ICE, run an LCA on each dataset using Mplus and Rich Jones' "runmplus", import all the posterior probs back into Stata and set up a stacked dataset so I can use MIM (module to analyse and manipulate multiply imputed datasets) based on the posterior probabilities.

This left me 2 problems to solve
1] Sorting the resulting classes within each dataset so that c1 always has the same interpretation
2] Having a sense of what the LCA model is, averaged over the 20 datasets. I guess I can save out the model parameters from each dataset and average them - I don't even need to look at parameter variances if I jsut want to picture the profiles.

Thanks for the heads-up about the new features - sounds like I've only been scratching the surface of what V6 can do.
 Jon Heron posted on Thursday, May 20, 2010 - 1:23 am
I think saving out the model parameters will solve both of these problems - I can use the parameters themselves to tell me the class ordering from each dataset :-)
 Jon Heron posted on Thursday, May 20, 2010 - 5:02 am
If I might carry on this dialogue with myself while you guys are still asleep, I am wondering about the potential problems of not have the latent class posterior probabilities present whilst performing the imputation.

As I understand it, there is an assumption that the data in your imputation model has a correlation of zero with any data not present in the dataset so if you then bring along something new you can get biased estimates of association between the new variable/construct and the stuff you've been imputing.

Seems to me that if the new variable is deterministically derived like a principle component then there's no problem, but I wonder if there may be a problem if you are deriving a latent factor or a latent class measure.
 Jenny Chang posted on Thursday, July 01, 2010 - 5:45 pm
Hi dear, I would like to use MPLUS 5 to estimate a same LCA model for 10 datasets simultaneously so as to save time.but the result have to be independent with each other.
As a new hand, I am not quite clear about the discussion above.Is it the same matter? Could you present an example of the sentence? Thank you!(
 Linda K. Muthen posted on Friday, July 02, 2010 - 11:14 am
I think the easiest way to do this is using a DOS bat file.
 John Woo posted on Monday, September 27, 2010 - 7:14 pm

I have a follow up question to your answer to the very first question of this thread.

You said,
"To avoid label switching, use user-specified starting values. You can take these from the analysis of one of the imputed data sets."

If I use manual starting values, as you suggest, then how can I make sure that the result is the global maximum instead of local?

If my "ice" from STATA produced five sets of imputed data, then should I obtain five sets of starting values from running each model separately, and then run the TYPE=IMPUTATION model five times using each set of starting values? If I did and if the results come out different for each set of starting values, what criteria do I use to pick the best model?

Thank you in advance for your help.
 Linda K. Muthen posted on Tuesday, September 28, 2010 - 9:56 am
I would run one data set where the best loglikelihood is replicated. I would then use those parameter estimates for the TYPE=IMPUTATION analsyis. To be certain that you replicate the best loglikelihood in each data set, you would need to analyze each one separately.
 Simone Schmidt posted on Thursday, December 09, 2010 - 6:02 am
I am conducting a 3 class LCA and I´ve got missings on the covariates. Therefore I´m using multiple imputation (m=10). Imputation seems to work well but when using the datalist.dat with type=imputation, for the regression model I only get class 1 and class 2 vs. class 3 in the output. The Alternative Parameterization is not provided so I do not get class 1 and class 3 vs. class 2 etc. (I had the same problem when using the FIML procedure)It works well when doing the analyses seperately for all the 10 imputed datasets - is it ok to pool the results by myself by calculating the means of the estimates?
 Linda K. Muthen posted on Thursday, December 09, 2010 - 12:19 pm
You can pool the parameter estimates but not the standard errors of the parameter estimates.
 Oxnard Montalvo posted on Tuesday, January 22, 2013 - 5:53 pm
Hi, in regards to the comments made a few posts above by Jon Heron, about all covariates for the analysis model being included in the imputation model, is this in fact the case for latent variables?
Does that mean we have to use the H0 approach, to ensure the latent variables of interest will be included, or is the H1 imputation general enough to also cover and latent correlations?
Thanks :-)
 Bengt O. Muthen posted on Tuesday, January 22, 2013 - 6:45 pm
I don't think you have to do H0 latent variable - based imputation just because you have a latent variable model in mind. An H1 model would work fine. It is not wrong; just not using the underlying structure. An exception might be when the latent variable is a latent class variable, in which case imputing under a single-class model would be using a wrong model.

I hope that is what you were asking.
 Oxnard Montalvo posted on Thursday, January 24, 2013 - 2:50 pm
Hi Bengt,
Yes to a certain extent - I was planning on using latent class models for my analysis.
 Raghav Ramachandran posted on Thursday, September 24, 2015 - 2:31 pm
Drs. Muthen,

I am implementing an LCA model with covariates, including interactions between pairs of covariates, using multiply imputed data (the multiple imputation-FCS algorithm was run in SAS v9.3). However, when I use type=imputation in the data command to calculate pooled parameter estimates over all ten of my imputed data sets, the values do not equal the average of the same parameters over each of the 10 data sets. I've noticed that the LCA item-response probabilities that I obtain from the pooled analysis also differ widely from the analyses of individual imputed data sets. I'd really appreciate suggestions on how to resolve this issue when using type=imputation. Thanks for the help.

 Linda K. Muthen posted on Thursday, September 24, 2015 - 5:22 pm
They should be the average. You would need to send some evidence that they don't along with your license number to
 benedetta posted on Wednesday, June 05, 2019 - 2:14 pm

I am estimating a latent class model and then for the estimated classes I run a multinomial logit model, while accounting for missing data through multiple imputation. Is it possible to do everything in one step? Would you recommend to first do multiple imputation and then LCA and multinomial model?
Many thanks
 Tihomir Asparouhov posted on Wednesday, June 05, 2019 - 6:33 pm
I think you can do it in one step without multiple imputation using algo=integration; integration=montecarlo.

Alternatively, you can do multiple imputation, in one step together with the LCA estimation, following the setup described in User's guide example 11.6.

A third method is to do the multiple imputation first as in User's Guide example 11.5 followed by LCA model estimation as in User's guide example 13.13.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message