Message/Author 


Dear Dr. Muthen, I would like to ask if the current Mplus can fit the DiggleKenward selection model and/or sharedparameter model for nonignorable missing data. If yes, is there a code example that you may show me? Thank you. 


By DiggleKenward, do you mean missingness that is a function of the variable which has missingness? If so, yes. If not, please send me the article describing the model. 

V X posted on Wednesday, October 10, 2007  3:02 am



Dr. Muthen, I am also interested in learning Mplus to fit DiggleKenward selection model and sharedparameter model for nonignorable missing data (that is, missing not at random). Would you provide some Mplus code examples? Thank you. 


I think DiggleKenward consider missingness as a function of the (latent) response variables y  what you would have observed if if wasn't missing. You could use DATA MISSING to create binary missing data indicators and then regress those "u's" on the "y's" that have missingness by regular ON statements (y on u). I am not familiar with the term "sharedparameter model". 


P.S. For more nonignorable missing data modeling, see also the Lecture 17 handout from my UCLA latent variable course at http://www.gseis.ucla.edu/faculty/muthen/Handouts.htm 

V X posted on Friday, October 12, 2007  1:00 pm



Dr. Muthen, the Lecture 17 notes for Educ 231E class helps me a lot to understand the "outcome based dependent selection model". Thank you. Currently, I have one question about the path diagram of "growth mixture model with nonignorable missingness as a function of y". I know a circle represents a latent variable. Would you intepreate that, what the meaning of putting a circle on y in the diagram under the contents of selection model? 


For individuals who have missing data on y, the y variable is a latent variable. 


Dr Muthen, You said "for individuals who have missing data on y, the y variable is a latent variable". Do you mean that by creating a latent variable CY like the figure in slide 6 of your Lecture 17, we can fit a nonignorable missingness model with missing Y? 


Yes. But you can also do nonignorable missingness modeling without cy (which is categorical) using the model of slide 3. Neither is an easy thing to do. 


Dear Dr. Muthen, I was wondering if the syntax for the models that you discuss in the slides http://www.gseis.ucla.edu/faculty/muthen/Handouts.htm (Lecture 17) is available. Kind Regards, Liesbeth 


No, it is not. But a new missing data paper will be posted within short which discusses alternative models for nonignorable missing data and you can then request the Mplus setups for those analyses. 

Tim Stump posted on Friday, April 20, 2012  3:01 pm



I have a cohort of type II diabetes adolescents with hemoglobin a1c collected at baseline (prior to high school graduation), 3, 6, 9, and 12 month time pts. We know that our a1c outcome does not satisfy MAR assumption because we could not get all chart review data from physicians offices after adolescents left home. Baseline a1c is not missing, but missing increases over time. The cohort is relatively small with 180 subjects, but would like to explore some of the models outlined in "Growth Modeling With Nonignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial". Our goal is simply to model a1c over time and see if trajectory is different for a couple of binary covariates and if missing a1c influences trajectory. Would have you any suggestions as to which type of NMAR model would work better with our small sample? 


Patternmixture modeling is probably the easier to work with. 


Dear Drs. Muthen, I am looking for a way to model nonignorable missing data in a LCA model. Case is I have 3 variables, each representing the age at onset of a 3stage process. As stages 2 and 3 are only possible if the previous one has been reached, missing data on stage 2 and/or stage 3 are nonignorable. The missing data structure looks like a monotone missing pattern, except that they are not dropout: the missing data are informative that the next stage was not reached and I want to include this information in the model. Structure of the database is : s1 s2 s3 9 12 13 12 14 17 14 15 . 11 16 . 13 . . 17 . . Do you have any advice on how to implement such a model in Mplus? The closest I found is the DiggleKenward selection model (Ex. 11.3), but there are no i, s, q components in what I model... Many thanks 


So you have an LCA based on 3 cont's variables. Does s1 predict missingness on s2 and s3 and s1 is always observed, so MAR? Regarding nonignorable, are you saying that the values that would have been observed for s2 and s3 predict their missingness? 


You are right on the MAR component. I might have erred regarding the nonignorable. Cont's var are age at three stages: s1 = selfawareness; s2 = selfidentification; s3 = disclosure. They follow a sequence constraining values in a way that s3>s2>s1. Data are crosssectional. s1 is always observed (inclusion criteria). s2 can be observed or not (if stage s1 is completed and s2 has been reached). s3 can only be observed if s2 is observed AND s3 has been reached. I want to model trajectories that take into account both the occurrence (yes or no) of the stages and, if the next stage has been reached, the age at which that happened. Thanks for your help! 


I think I would need to understand the setting better to help you and that goes beyond Mplus Discussion. You may want to ask on SEMNET. You want to make clear why LCA is of interest to you (why mixtures?) and why you want to model trajectories (trajectories of what?). Two comments: Selection modeling of missing data like DiggleKenward can be done without a growth model; survival modeling might be relevant given that you want to model age at which the events happened (perhaps multivariate survival; see the Masyn dissertation on our website). 


I have a longitudinal dataset over 5 waves with 1600 cases. I think that the missing data is MNAR because missingness on IVs is related to the DVs in my model. How is it best to model missing data in Mplus when it is MNAR? Am I right in thinking that FIML is not appropriate? Many thanks 


I don't understand how DV values can influence missing on IVs. Typically IVs influence DVs. 


Yes, sorry that's what I meant 


IVs influencing DV missingness is covered by MAR, not necessarily necessitating NMAR modeling. 


OK many thanks. Tabacnick & Fidell (2007) state that “MNAR is inferred if the (missing variable analysis) t tests show that missingness is related to the DV.” (p63) Would it be possible to ask for clarification since this seems to contradict what you are saying? Would it also be possible to ask for clarification on one separate Mplus issue? In order to conduct measurement invariance for GCM is it required to have the exact same measure at each wave? E.g if one has a measure with some different items at each wave (for example to make the measure age appropriate), am I right in thinking that standardised scores cannot be used and the only option is to use only those items that appear at each wave (wave counterpart items)? Many thanks again 


MAR is proper if the missingness can be predicted by observed variables. NMAR is at hand if missingness is predicted by unobserved variables such as the value that would have been observed or other latent variables. Don't use standardized scores in growth modeling. You can deal with different items at different time points if you take a multiple indicator approach with measurement invariance for items that are in common. 

Harmen Zoet posted on Thursday, November 10, 2016  12:27 am



Hi, I want to conduct an LPA with treatment outcome scores as indicators (operationalized as difference scores (T1  T2) of items on a questionnaire). However, I have missing data which might be nonignorable. After all, it is plausible that I have more missing data on my last point of measurement (T2) for those who do not respond to treatment . What is now the best way to deal with my missing data? Should I first of all use multiple imputation for all items, then compute my difference scores, prior to running my LPA? Or is it also possible (and better) to first compute my difference scores, then run my LPA while at the same time using maximum likelihood estimation (or FIML)? Or do I completely miss the point and is it necessary to do this in a different way? Thanks in advance! 


Use ML instead of multiple imputation whenever you can to deal with missing data. I am not sure why you need to work with difference scores  keeping the original scores might be better and uses all available information with ML. The estimated classes will tell you if increases, decreases, or constancy forms the classes. 

Harmen Zoet posted on Thursday, November 17, 2016  1:12 am



Dear dr Muthen, Thanks for your answer. I'm considering your advice, but can not really figure it out. Would the following be correct: Variable: Names are nummer T0B1FR (...) KIPPOST; USEVARIABLES are T0B1FRT2D5IN KIPPOST; CLASSES = c (4); MISSING is all (999); Analysis: Type = mixture; Estimator = ML; Starts = 100 10; Output: TECH1 TECH10 TECH11 TECH14 Plot: Type = PLOT3 Savedata: File is GMM.dat; save = cprobabilities; All indicator variables are the items from the questionnaire, both pre and posttreatment (expect for KIPPOST, this is a posttreatment total severity score). I used specified starting values, because automatic starting values lead to local maxima. Weird thing, however, is that I cannot view any plot from the estimated probabilities when I run the above. It is, otherwise, not possible to us ML while computing difference scores in Mplus (before running the LPA), right? 


You should have a semi colon after PLOT3. Automatic starting values should be just fine using enough Starts. Check which run has the best loglikelihood. To save time, never use Starts together with Tech11 or Tech14  see web note 14. You can use difference scores if you like. 

Back to top 