Non-ignorable missing data in longitu... PreviousNext
Mplus Discussion > Missing Data Modeling >
Message/Author
 Bengt O. Muthen posted on Sunday, February 21, 2010 - 4:41 pm
The missing data modeling paper I have referred to in earlier Mplus Discussion posts is now posted with Mplus scripts at

http://www.statmodel.com/examples/penn.shtml

See

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2010). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Submitted for publication.
 Jon Heron posted on Thursday, May 06, 2010 - 10:33 am
Hi Bengt,

I recently attempted the final model (figure 15) without the growth component - i.e. parallel LCA models for my data and for indicators of missing.

I got huge bivariate residuals between u_i and m_i at each time point. Is there a way round this? Perhaps this is even acceptable provided the other residuals are small.

Unfortunately my measures are subtly different across time so growth factors are not really an option.



many thanks, Jon
 Bengt O. Muthen posted on Thursday, May 06, 2010 - 5:25 pm
Interesting question. I haven't done binary u and m together a la Fig 15 so I haven't experienced this. When m=1, then u is missing so it's a funny 2x2.

I wonder if there is a way to introduce a random effect (factor) that influences both u and m. This contemporaneous residual correlation would seem to be potentially common for two-part modeling. Simplest would be one factor with different loadings for the different time points, rather than a factor for each time point.
 Jon Heron posted on Friday, May 07, 2010 - 12:10 am
Thanks Bengt,

I had been thinking multiple factors but that seemed like potentially a lot in integration + i've had limited success with those models (e.g. ex7.16 guide v5) so far.

Incidentally, the Latent Gold solution is to allow a direct effect between the two offending variables. This is clearly less sophisticated (I know that cos it runs really quickly!!) but I haven't gotten to the bottom of exactly what this does and whether it's desireable. I suspect not.


all the best, Jon
 Bengt O. Muthen posted on Friday, May 07, 2010 - 8:57 am
Yes, modeling a direct effect between categorical outcomes is on our to-do list. And related to your recent SEMNET post, we also have on our list to simplify the specification of models with more than one categorical latent variable, using probability statements instead of logit statements. This will also simplify LTA and mover-stayer LTA.
 Jon Heron posted on Monday, May 10, 2010 - 12:41 am
I really ought to be content with the billion and one things that Mplus *can* do!
 Jon Heron posted on Thursday, May 13, 2010 - 3:40 am
me again!

I'm struggling with what seems to me to be a very basic pattern mixture model.

Three 4-level categorical variables of smoking frequency
4-class mixture
C regressed on the 6 dummy variables which represent the 7 missingness patterns.
7,300 cases, with complete data on 3,000.

I guess this is a simplification of Roy's model, but also similar to Hedeker's model from the substance-use textbook where Bengt also has a couple of chapters.

I have fitted this model in Latent Gold with no apparent problems - here I am able to use a 7-level nominal covariate rather than 6 dummies. In Mplus I get a very similar likelihood but either one or two logit parameters fixed. I've taken this up to 2 million random starts to no avail.

My reason for favouring Mplus - I'm preaching to the choir here! - is that it will give me odds ratios for class membership across the missing data patterns whereas LG only gives the proportions in each class. Of course I can use these proportions to get approximate OR's and this shows me that a couple are pretty large (approx 20) but I wonder whether LG is avoiding an awkward estimation step by not computing OR's directly.

I should say that after this initial step I plan to collapse down to a smaller number of missing-data patterns, but it'd be really nice if I could get these OR's before I do.

bw, Jon
 Bengt O. Muthen posted on Thursday, May 13, 2010 - 8:08 am
You get OR's from Mplus even though the two logits are fixed, right, or do you get infinity? Is your question why LG doesn't fix parameters while Mplus does - if so, perhaps the programs have different convergence criteria; you can check which has the higher LL.
 Jon Heron posted on Thursday, May 13, 2010 - 8:21 am
It varies, I get a bunch of believable OR's and then either a couple of really big/small ones or a couple of infinities. It seems to depend on the ordering of the classes.

Mplus has a slightly higher LL: -7384.768 compared with LG's -7384.8631

I guess my question was: is Mplus having to fix parameters because it's doing an additional complex step, i.e. deriving the OR's. However, it sounds like you are saying that with the same convergence criteria I should run into the same issues with both packages.

Incidentally, with a 3 class model everything is happy and the OR's produced by Mplus agree pretty well with the approximate ones derived from LG's proportions.
 Bengt O. Muthen posted on Thursday, May 13, 2010 - 10:07 am
No Mplus doesn't fix logits because of computing ORs. The ORs are computed after the model has been estimated, using the parameter estimates. Mplus fixes logits when it is clear that they are going to infinity corresponding to probabilities of either zero or one. So it is a convergence matter - Mplus is pretty strict.
 Jon Heron posted on Thursday, May 13, 2010 - 10:20 am
So the bottom line is, if I persevere with my model in both packages, I will eventually have two problems and not just one

:-)
 Jon Heron posted on Saturday, May 15, 2010 - 12:26 am
Sorry to drag this out, but it had occurred to me that all may not be lost. I had assumed that fixed logits meant game over, but this might just be telling me that I need to constrain a parameter myself.

For instance, it seems quite reasonable that I might want to predict adolescent smoking behaviour from maternal smoking and it also seems likely that non-smoking mothers might be unlikely to give rise to many early-onset heavy-smoking children. Constraining a parameter to zero in this instance sounds sensible and might be the only way to get a model to run without errors.
 Bengt O. Muthen posted on Monday, May 17, 2010 - 9:57 am
I don't see fixed logits as a problem. They make the interpretation easy in that they correspond to probabilities of 0 and 1. You can constrain the logit yourself to get these 0's or 1's, but Mplus does it for you.

The only issue is that some ORs can't be computed, but that can be explained in the writing.

For similar issues, see the 2008 Drug and Alcohol Dependence article:
Developmental epidemiological courses leading to antisocial personality
disorder and violent and criminal behavior: Effects by young adulthood
of a universal preventive intervention in first- and second-grade classrooms,
by Hanno Petras, Sheppard G. Kellam. Hendricks Brown, Bengt O. Muth´en ,
Nicholas S. Ialongo , Jeanne M. Poduska
 Jon Heron posted on Tuesday, May 18, 2010 - 1:29 am
thanks Bengt, I'll leave you in peace now

Jon
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: