Arbitrary missing X & Y in complex su... PreviousNext
Mplus Discussion > Missing Data Modeling >
 Kevin K. Makino posted on Monday, February 27, 2012 - 10:10 pm
I am a PhD student using the ECLS-K data for my dissertation analyses, and am hoping that all of you can help me to understand how to best model arbitrarily missing data in both the X and Y variables in my analyses.

My current understanding is that the complex sampling mechanism used by the ECLS-K violates the iid asumptions implicit in MI approaches, suggesting that MI may not be appropriate for my purposes. However, I have been having difficulty finding information in the applicability of other approaches to dependency structures in complex survey data. (I am also new to the world of SEM, and so am not familiar with the relative robustness of this method to various assumptions.) To what degree are the various missing data methods available in Mplus appropriate for modeling arbitrary missingness in both the X and Y variables in the setting of complex survey data, and is there a recommended approach for dealing with this situation?

Thank you very much in advance for your guidance!
 Bengt O. Muthen posted on Wednesday, February 29, 2012 - 8:29 am
Mplus handles the clustering aspect by being able to do multiple imputation (MI) using Type = Twolevel - see our User's Guide. You can also handle weights - see

where we say:

"Another advantage of the MPML estimator
is that it can easily incorporate missing data under the standard assumption of missingness at random (MAR)."
 Kevin K. Makino posted on Monday, March 05, 2012 - 1:04 pm
Thank you very much for your guidance - and for the link to the reference document. On a related note, given that the paired jackknife replicate weights are recommended by the ECLS data managers as the most appropriate way to generate variance estimates, is there anything wrong with performing multiple imputations using the MPML approach, and then generating variance estimates for my analyses using the paired jackknife replicate weights?

Thank you again!
 Tihomir Asparouhov posted on Monday, March 05, 2012 - 3:51 pm
The most appropriate method for your situation would be to use a single level model and specify the complex sampling features such as the clustering variable, the weight variable, the replicated weights as well as the missing data value. This approach uses full information maximum likelihood treatment for the missing data and is currently your best option for complex survey data with missing data.


 Kevin K. Makino posted on Monday, March 05, 2012 - 9:02 pm
Thank you very much for the recommendation, and the guiding code!
 Kevin K. Makino posted on Tuesday, March 06, 2012 - 1:21 am
Professor Asparouhov, if I may ask, is your recommendation to use the FIML approach based on a general superiority of the FIML approach, or on the specifics of this instance where the jackknife replicate information cannot be taken into account by multiple imputation methods? (If the FIML approach is generally superior, do you have a reference you would recommend for this?)

Thank you again for your kind assistance!
 Tihomir Asparouhov posted on Tuesday, March 06, 2012 - 12:22 pm
The main issue is that multiple imputation will not accommodate the sampling weights and that could lead to problems. On the other hand FIML will correctly analyze missingness at random (MAR) data.
 Kevin K. Makino posted on Tuesday, March 06, 2012 - 7:21 pm
Thank you very much Professor Asparouhov - I realized that there is one more piece of the puzzle that may bear on the best option here; specifically, the issue is that my model includes not only continuous, but also categorical and dichotomous covariates.

In a post on April 21st of 2011 (in the "FIML vs MI" thread under Missing Data Modeling), Craig Enders described some of his potential concerns with using the FIML approach to missing data with dichotomous covariates - specifically that "employing a linear factor model to convert the X to a pseudo & might cause problems..." and that "the mean and the variance of the binary variable are linearly dependent when you use the linear factor model to handle the missing data." However, he also concluded that "I'm not sure that either of the problems are substantial..."

It seems that there may be a tradeoff here between modeling of the sampling, and modeling of the variables... In this particular instance, would you still recommend using FIML in Mplus to handle the missing data?
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message