Message/Author 


Greetings. I'm doing something that might be a bit dicey, but I don't have any better ideas and I'm hoping for feedback. I've done a latent class analysis in Mplus. When I added predictors, some of which had significant (~20%) missingness, I went to MI. I'm interested in the effects of one of the predictors on class membership. There is missingness in the indicators, so naturally class definitions and sizes vary across imputations. This introduces a great deal of (what seems to me to be spurious) betweenimputation variance that is swamping any effects. The fractions of missing information are running around 99%, and df are only fractionally above the number of imputations, so I need to do a large number of imputations to get any stability. I tried a hybrid approach, where I set the latent class definition parameters to be fixed at the values from the original LCA (with no predictors and thus no need for MI) across imputations. I'm still getting extremely high missing information and low degrees of freedom. With large numbers of imputations (50), I am finding effects, but they're with DF of 51. I'm assuming the problem is because the sizes of the classes vary. I'm concerned that the large missing information is a problem. Any other suggestions on how to combine MI and LCA? Or otherwise tackle this situation? Thanks, Pat 

bmuthen posted on Thursday, September 02, 2004  8:21 am



That's a difficult situation; I'm not sure how to improve the MI approach. An alternative that could be considered now given version 3 is to do ML analysis with the missingness on the x's taken into account by usual MAR and normality assumptions. By this I mean that V3 can predict class membership from y's, not only x's, so if you "turn the x's into y's" they can have missing data and will be handled using the usual normality assumptions (an assumption that imputers tell me isn't that harmful even with binary x's). Turning the x's into y's is done by mentioning their variances in the model. Note, however, that with missing data on the x's, this leads to highdimensional numerical integration as soon as you have more than a couple of x's and is therefore slow (although the new version 3.11  available today  can better handle numerical integration with as few as 5 integration points). 


Thanks, it's churning away  I think I tried this before, but it had too many integration dimensions (7); I'm trying it with 3.11 now. 

bmuthen posted on Thursday, September 02, 2004  5:11 pm



I guess 5 to the power of 7 is also a high number, but anything less than 5 integration points per dimension gives only a rough ML approximation. I guess you could try 3 for a very rough picture. 


Thanks. I ended up trying a Monte Carlo integration with 1000 points, but after it ran for 11.5 hours, it crashed with "DETERMINANT OFA MATRIX IS TOO LARGE." . I know I've asked Linda about that error message before, but a computer crash has wiped out my email archives and I don't remember her reply. I can probably drop one of the variables; I'll try 5^6 on my fastest machine and see what happens. 


Nope, didn't like that. "This model can be done only with MonteCarlo integration." I'll pursue the original approach with imputation people; please let me know if you think of something else. 

Andy Ross posted on Friday, June 02, 2006  4:37 am



Dear Prof Muthen Is there a way to save the conditional probabilities using the save data option when modelling with multiple imputed datasets? If so what is the command code for this? Many thanks Andy 


The conditional probabilities are not save with TYPE=IMPUTATION; 

CB posted on Thursday, April 09, 2015  12:50 pm



Hello, I am running LCA with two parameterizations of the same variables as well as running these models with and without multiple imputation. In the first parameterization, I'm using both unordered and ordered categorical variables. When I run the models with and without multiple imputation, I get the same results. In the second parameterization, I'm using all binary variables. However, when I run the models with and without multiple imputation, I get different results. Do you have any thoughts as to why I'm getting different results when I code the variables as binary? Is it a possible issue of identifiability? Or have any resources that I could look at to help me figure this out? Thanks so much! 


Please send to support the outputs for the 2 binary runs, plus the imputation run for it. 

Back to top 