Message/Author 


The missing data modeling paper I have referred to in earlier Mplus Discussion posts is now posted with Mplus scripts at http://www.statmodel.com/examples/penn.shtml See Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2010). Growth modeling with nonignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Submitted for publication. 

Jon Heron posted on Thursday, May 06, 2010  10:33 am



Hi Bengt, I recently attempted the final model (figure 15) without the growth component  i.e. parallel LCA models for my data and for indicators of missing. I got huge bivariate residuals between u_i and m_i at each time point. Is there a way round this? Perhaps this is even acceptable provided the other residuals are small. Unfortunately my measures are subtly different across time so growth factors are not really an option. many thanks, Jon 


Interesting question. I haven't done binary u and m together a la Fig 15 so I haven't experienced this. When m=1, then u is missing so it's a funny 2x2. I wonder if there is a way to introduce a random effect (factor) that influences both u and m. This contemporaneous residual correlation would seem to be potentially common for twopart modeling. Simplest would be one factor with different loadings for the different time points, rather than a factor for each time point. 

Jon Heron posted on Friday, May 07, 2010  12:10 am



Thanks Bengt, I had been thinking multiple factors but that seemed like potentially a lot in integration + i've had limited success with those models (e.g. ex7.16 guide v5) so far. Incidentally, the Latent Gold solution is to allow a direct effect between the two offending variables. This is clearly less sophisticated (I know that cos it runs really quickly!!) but I haven't gotten to the bottom of exactly what this does and whether it's desireable. I suspect not. all the best, Jon 


Yes, modeling a direct effect between categorical outcomes is on our todo list. And related to your recent SEMNET post, we also have on our list to simplify the specification of models with more than one categorical latent variable, using probability statements instead of logit statements. This will also simplify LTA and moverstayer LTA. 

Jon Heron posted on Monday, May 10, 2010  12:41 am



I really ought to be content with the billion and one things that Mplus *can* do! 

Jon Heron posted on Thursday, May 13, 2010  3:40 am



me again! I'm struggling with what seems to me to be a very basic pattern mixture model. Three 4level categorical variables of smoking frequency 4class mixture C regressed on the 6 dummy variables which represent the 7 missingness patterns. 7,300 cases, with complete data on 3,000. I guess this is a simplification of Roy's model, but also similar to Hedeker's model from the substanceuse textbook where Bengt also has a couple of chapters. I have fitted this model in Latent Gold with no apparent problems  here I am able to use a 7level nominal covariate rather than 6 dummies. In Mplus I get a very similar likelihood but either one or two logit parameters fixed. I've taken this up to 2 million random starts to no avail. My reason for favouring Mplus  I'm preaching to the choir here!  is that it will give me odds ratios for class membership across the missing data patterns whereas LG only gives the proportions in each class. Of course I can use these proportions to get approximate OR's and this shows me that a couple are pretty large (approx 20) but I wonder whether LG is avoiding an awkward estimation step by not computing OR's directly. I should say that after this initial step I plan to collapse down to a smaller number of missingdata patterns, but it'd be really nice if I could get these OR's before I do. bw, Jon 


You get OR's from Mplus even though the two logits are fixed, right, or do you get infinity? Is your question why LG doesn't fix parameters while Mplus does  if so, perhaps the programs have different convergence criteria; you can check which has the higher LL. 

Jon Heron posted on Thursday, May 13, 2010  8:21 am



It varies, I get a bunch of believable OR's and then either a couple of really big/small ones or a couple of infinities. It seems to depend on the ordering of the classes. Mplus has a slightly higher LL: 7384.768 compared with LG's 7384.8631 I guess my question was: is Mplus having to fix parameters because it's doing an additional complex step, i.e. deriving the OR's. However, it sounds like you are saying that with the same convergence criteria I should run into the same issues with both packages. Incidentally, with a 3 class model everything is happy and the OR's produced by Mplus agree pretty well with the approximate ones derived from LG's proportions. 


No Mplus doesn't fix logits because of computing ORs. The ORs are computed after the model has been estimated, using the parameter estimates. Mplus fixes logits when it is clear that they are going to infinity corresponding to probabilities of either zero or one. So it is a convergence matter  Mplus is pretty strict. 

Jon Heron posted on Thursday, May 13, 2010  10:20 am



So the bottom line is, if I persevere with my model in both packages, I will eventually have two problems and not just one


Jon Heron posted on Saturday, May 15, 2010  12:26 am



Sorry to drag this out, but it had occurred to me that all may not be lost. I had assumed that fixed logits meant game over, but this might just be telling me that I need to constrain a parameter myself. For instance, it seems quite reasonable that I might want to predict adolescent smoking behaviour from maternal smoking and it also seems likely that nonsmoking mothers might be unlikely to give rise to many earlyonset heavysmoking children. Constraining a parameter to zero in this instance sounds sensible and might be the only way to get a model to run without errors. 


I don't see fixed logits as a problem. They make the interpretation easy in that they correspond to probabilities of 0 and 1. You can constrain the logit yourself to get these 0's or 1's, but Mplus does it for you. The only issue is that some ORs can't be computed, but that can be explained in the writing. For similar issues, see the 2008 Drug and Alcohol Dependence article: Developmental epidemiological courses leading to antisocial personality disorder and violent and criminal behavior: Effects by young adulthood of a universal preventive intervention in first and secondgrade classrooms, by Hanno Petras, Sheppard G. Kellam. Hendricks Brown, Bengt O. Muth´en , Nicholas S. Ialongo , Jeanne M. Poduska 

Jon Heron posted on Tuesday, May 18, 2010  1:29 am



thanks Bengt, I'll leave you in peace now Jon 


Dear Dr Muthén, I have longitudinal data, and I am not quite sure whether I can assume MAR. I wouldn't be surprised if people scoring higher on the variables of interest are more likely to drop out. To test whether it is 'safe' to assume MAR, I was adviced to do a sensitivity analysis by increasing the imputed values with x, and then do the analyses again. If the results would not differ much from the results obtained with the originally imputed values, I could assume MAR. I know you can change values using the DEFINE command, however, I am not quite sure how I would go about doing this for only the imputed values. The observed values for those variables should remain as they were. Thanks for your advice! Aurelie 


Not sure about that approach. Another direction is shown in the paper on our website: Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with nonignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 1733. Click here to view Mplus outputs used in this paper. download paper contact first author show abstract 

Back to top 