I am a PhD student using the ECLS-K data for my dissertation analyses, and am hoping that all of you can help me to understand how to best model arbitrarily missing data in both the X and Y variables in my analyses.
My current understanding is that the complex sampling mechanism used by the ECLS-K violates the iid asumptions implicit in MI approaches, suggesting that MI may not be appropriate for my purposes. However, I have been having difficulty finding information in the applicability of other approaches to dependency structures in complex survey data. (I am also new to the world of SEM, and so am not familiar with the relative robustness of this method to various assumptions.) To what degree are the various missing data methods available in Mplus appropriate for modeling arbitrary missingness in both the X and Y variables in the setting of complex survey data, and is there a recommended approach for dealing with this situation?
Thank you very much for your guidance - and for the link to the reference document. On a related note, given that the paired jackknife replicate weights are recommended by the ECLS data managers as the most appropriate way to generate variance estimates, is there anything wrong with performing multiple imputations using the MPML approach, and then generating variance estimates for my analyses using the paired jackknife replicate weights?
The most appropriate method for your situation would be to use a single level model and specify the complex sampling features such as the clustering variable, the weight variable, the replicated weights as well as the missing data value. This approach uses full information maximum likelihood treatment for the missing data and is currently your best option for complex survey data with missing data.
Professor Asparouhov, if I may ask, is your recommendation to use the FIML approach based on a general superiority of the FIML approach, or on the specifics of this instance where the jackknife replicate information cannot be taken into account by multiple imputation methods? (If the FIML approach is generally superior, do you have a reference you would recommend for this?)
Thank you very much Professor Asparouhov - I realized that there is one more piece of the puzzle that may bear on the best option here; specifically, the issue is that my model includes not only continuous, but also categorical and dichotomous covariates.
In a post on April 21st of 2011 (in the "FIML vs MI" thread under Missing Data Modeling), Craig Enders described some of his potential concerns with using the FIML approach to missing data with dichotomous covariates - specifically that "employing a linear factor model to convert the X to a pseudo & might cause problems..." and that "the mean and the variance of the binary variable are linearly dependent when you use the linear factor model to handle the missing data." However, he also concluded that "I'm not sure that either of the problems are substantial..."
It seems that there may be a tradeoff here between modeling of the sampling, and modeling of the variables... In this particular instance, would you still recommend using FIML in Mplus to handle the missing data?