I'm doing something that might be a bit dicey, but I don't have any better ideas and I'm hoping for feedback.
I've done a latent class analysis in Mplus. When I added predictors, some of which had significant (~20%) missingness, I went to MI. I'm interested in the effects of one of the predictors on class membership.
There is missingness in the indicators, so naturally class definitions and sizes vary across imputations. This introduces a great deal of (what seems to me to be spurious) between-imputation variance that is swamping any effects. The fractions of missing information are running around 99%, and df are only fractionally above the number of imputations, so I need to do a large number of imputations to get any stability.
I tried a hybrid approach, where I set the latent class definition parameters to be fixed at the values from the original LCA (with no predictors and thus no need for MI) across imputations. I'm still getting extremely high missing information and low degrees of freedom. With large numbers of imputations (50), I am finding effects, but they're with DF of 51. I'm assuming the problem is because the sizes of the classes vary.
I'm concerned that the large missing information is a problem. Any other suggestions on how to combine MI and LCA? Or otherwise tackle this situation?
bmuthen posted on Thursday, September 02, 2004 - 8:21 am
That's a difficult situation; I'm not sure how to improve the MI approach. An alternative that could be considered now given version 3 is to do ML analysis with the missingness on the x's taken into account by usual MAR and normality assumptions. By this I mean that V3 can predict class membership from y's, not only x's, so if you "turn the x's into y's" they can have missing data and will be handled using the usual normality assumptions (an assumption that imputers tell me isn't that harmful even with binary x's). Turning the x's into y's is done by mentioning their variances in the model. Note, however, that with missing data on the x's, this leads to high-dimensional numerical integration as soon as you have more than a couple of x's and is therefore slow (although the new version 3.11 - available today - can better handle numerical integration with as few as 5 integration points).
Thanks. I ended up trying a Monte Carlo integration with 1000 points, but after it ran for 11.5 hours, it crashed with "DETERMINANT OFA MATRIX IS TOO LARGE." . I know I've asked Linda about that error message before, but a computer crash has wiped out my e-mail archives and I don't remember her reply. I can probably drop one of the variables; I'll try 5^6 on my fastest machine and see what happens.
I am running LCA with two parameterizations of the same variables as well as running these models with and without multiple imputation. In the first parameterization, I'm using both unordered and ordered categorical variables. When I run the models with and without multiple imputation, I get the same results. In the second parameterization, I'm using all binary variables. However, when I run the models with and without multiple imputation, I get different results.
Do you have any thoughts as to why I'm getting different results when I code the variables as binary? Is it a possible issue of identifiability? Or have any resources that I could look at to help me figure this out?