Jon Heron posted on Monday, July 17, 2006 - 1:05 pm
I've just fitted a 5-class LCA model on 10 imputed datasets.
The results are nonsense.
I think that the reason for this is that
"Parameter estimates are averaged over the set of analyses"
means that the resulting parameters for class-1 are the average of the parameters for the class-1's for each of the 10 datasets. Hence, if the ordering of the 5-classes is different within each dataset then everything gets jumbled.
If so, I'm wondering how to get round this. Are the same random starts used within each dataset?
Hello! I wonder if there is any update to the above postings regarding the ability to either save (or even view) the class probabilities for an LCA model with multiply imputed datasets. If it's not possible to save the class probabilities, then I am not sure how to determine which observations should be classified into which class. Thanks very much for your help, Stephen
It would not make sense to save the average of the class probabilities. What you should do is run the analysis with all of the parameters fixed to the values of the average estimates from the multiple imputation analysis and obtain latent class probabilities from that model. Use one data set. It doesn't matter which one because all of the parameters are fixed.
I was looking for a way to save the class probabilities for an LCA model with multiply imputed datasets when I came across the postings above. I understand your answer from 6/14/08 in principle, but I don't know how to translate it into practice. My model has three classes, and I’m running Mplus 5.21.
When you say “all the parameters fixed to the average estimates” are you talking about the mean intercept and slope for each latent class? Do you have any code you can share?
If the model run with multiple imputations has 10 model parameters, then these 10 parameters must be fixed in the MODEL command to the values estimated under multiple imputation. Then run the analysis as a regular analysis with any one of the imputed data sets and ask for CPROBABILITIES in the SAVEDATA command.
GP posted on Wednesday, January 13, 2010 - 3:58 pm
Thank you for your reply; as you can tell I'm pretty new to Mplus and still confused. This is the code for the analysis with the imputation:
Data: File is C:\MPLUS\imp.dat; Type = IMPUTATION;
Variable: Names are id dv0 dv1 dv2; Missing are all (-9); Idvariable is id; Classes = C(2);
Analysis: Type = mixture; Starts 5000 100;
Model: %Overall% i s | dv0@0 dv1@1 dv2@2; i-s@0;
Output: tech9 tech11;
Savedata: Results is C:\MPLUS\imp.dat;
If I understand correctly, I should rerun the model on one of the datasets adding something to the MODEL statement to set the values of dv0 dv1 and dv2 to the estimates obtained by imputation. The imputation output reports an estimated intercept and slope for each of the dv's, separateley for class 1 and class 2. Are these the values I should use for the revised MODEL statement? Can you please show me how the code would look like?
Please send your full output and license number to email@example.com. IMPUTATION and MIXTURE can be used together.
Jon Heron posted on Wednesday, May 19, 2010 - 10:33 am
I've returned to the task analysing a latent class model across a number of imputed datasets (created using ICE in Stata).
I see I started this thread nearly 4 years ago - wish i'd sorted this problem out then
Anyway, I have 20 datasets. I've fitted the 4-class model to the first one and then used the resulting parameters as starting values for all the parameters in the combined analysis with type=imputation. Despite the starting values, the ordering of classes between datasets changes so that the result bears no resemblance to those obtained were I to analyse each dataset in succession.
As I plan to do my covariate analysis in Stata, I could probably make do without the results of the model averaged over my 20 datasets, provided I can get hold of the posterior probabilities for each of my 20 models - I expect I can then use these in Stata within the MIM routine. I see that savedata does not work with imputation, so do I have any alternative to fitting the same model to each dataset myself and combining the outputted results?
In Version 6 you can do ICE within Mplus using the Sequential imputation option (see the V6 UG pp. 463-464). Also, V6 Type=Imputation now has the default of automatically using starting values from the first imputation analysis for subsequent imputations.
The class switching would seem to happen if you have substantial amounts of missingness and/or a couple of classes that are really close.
Since you want to do a "covariate analysis" at the end it sounds like you plan to use posterior probs (or most likely class) as DVs, in which case it might be better to use the V6 approach of Plausible values for the latent classes. This is a research topic, however.
By the way, you might also find the new V6 OUTPUT option SVALUES useful.
Jon Heron posted on Thursday, May 20, 2010 - 7:10 am
my plan was to do ICE within Stata and them compare the results with those obtained through ICE + Mplus
I do have substantial missingness - 3,000 complete cases, another 4,500 with partial missing data, however even the largest class representing 80% of the sample can occur as either class 1, 2, 3 or 4 when I use the starting values from dataset 1 and apply them to my other 19 datasets individually.
I've now written some Stata code to impute using ICE, run an LCA on each dataset using Mplus and Rich Jones' "runmplus", import all the posterior probs back into Stata and set up a stacked dataset so I can use MIM (module to analyse and manipulate multiply imputed datasets) based on the posterior probabilities.
This left me 2 problems to solve 1] Sorting the resulting classes within each dataset so that c1 always has the same interpretation 2] Having a sense of what the LCA model is, averaged over the 20 datasets. I guess I can save out the model parameters from each dataset and average them - I don't even need to look at parameter variances if I jsut want to picture the profiles.
Thanks for the heads-up about the new features - sounds like I've only been scratching the surface of what V6 can do.
Jon Heron posted on Thursday, May 20, 2010 - 7:23 am
I think saving out the model parameters will solve both of these problems - I can use the parameters themselves to tell me the class ordering from each dataset
Jon Heron posted on Thursday, May 20, 2010 - 11:02 am
If I might carry on this dialogue with myself while you guys are still asleep, I am wondering about the potential problems of not have the latent class posterior probabilities present whilst performing the imputation.
As I understand it, there is an assumption that the data in your imputation model has a correlation of zero with any data not present in the dataset so if you then bring along something new you can get biased estimates of association between the new variable/construct and the stuff you've been imputing.
Seems to me that if the new variable is deterministically derived like a principle component then there's no problem, but I wonder if there may be a problem if you are deriving a latent factor or a latent class measure.
Jenny Chang posted on Thursday, July 01, 2010 - 11:45 pm
Hi dear, I would like to use MPLUS 5 to estimate a same LCA model for 10 datasets simultaneously so as to save time.but the result have to be independent with each other. As a new hand, I am not quite clear about the discussion above.Is it the same matter? Could you present an example of the sentence? Thank you!(email:firstname.lastname@example.org)
I think the easiest way to do this is using a DOS bat file.
John Woo posted on Tuesday, September 28, 2010 - 1:14 am
I have a follow up question to your answer to the very first question of this thread.
You said, "To avoid label switching, use user-specified starting values. You can take these from the analysis of one of the imputed data sets."
If I use manual starting values, as you suggest, then how can I make sure that the result is the global maximum instead of local?
If my "ice" from STATA produced five sets of imputed data, then should I obtain five sets of starting values from running each model separately, and then run the TYPE=IMPUTATION model five times using each set of starting values? If I did and if the results come out different for each set of starting values, what criteria do I use to pick the best model?
I would run one data set where the best loglikelihood is replicated. I would then use those parameter estimates for the TYPE=IMPUTATION analsyis. To be certain that you replicate the best loglikelihood in each data set, you would need to analyze each one separately.
I am conducting a 3 class LCA and I´ve got missings on the covariates. Therefore I´m using multiple imputation (m=10). Imputation seems to work well but when using the datalist.dat with type=imputation, for the regression model I only get class 1 and class 2 vs. class 3 in the output. The Alternative Parameterization is not provided so I do not get class 1 and class 3 vs. class 2 etc. (I had the same problem when using the FIML procedure)It works well when doing the analyses seperately for all the 10 imputed datasets - is it ok to pool the results by myself by calculating the means of the estimates?
Hi, in regards to the comments made a few posts above by Jon Heron, about all covariates for the analysis model being included in the imputation model, is this in fact the case for latent variables? Does that mean we have to use the H0 approach, to ensure the latent variables of interest will be included, or is the H1 imputation general enough to also cover and latent correlations? Thanks
I don't think you have to do H0 latent variable - based imputation just because you have a latent variable model in mind. An H1 model would work fine. It is not wrong; just not using the underlying structure. An exception might be when the latent variable is a latent class variable, in which case imputing under a single-class model would be using a wrong model.
I am implementing an LCA model with covariates, including interactions between pairs of covariates, using multiply imputed data (the multiple imputation-FCS algorithm was run in SAS v9.3). However, when I use type=imputation in the data command to calculate pooled parameter estimates over all ten of my imputed data sets, the values do not equal the average of the same parameters over each of the 10 data sets. I've noticed that the LCA item-response probabilities that I obtain from the pooled analysis also differ widely from the analyses of individual imputed data sets. I'd really appreciate suggestions on how to resolve this issue when using type=imputation. Thanks for the help.