I have questions regarding Mplus' handling of missing data. According to Asparouhov and Muthen's (2008) technical report, "Standard maximum likelihood" generates unbiased estimates if missing data are MCAR or MAR, "but will lead to biased estimates if the missing data is NMAR ... Facilitaing the additional information that auxiliary variables provide can enable us to reduce or eliminate such bias." First, is the "[s]standard maximum likelihood" FIML? Or, is using auxiliary variables predicting missing data what FIML does? Second, how are the auxiliary variables selected and included in the estimation? Does Mplus does it or am I supposed to do it? If former is the case, how are auxiliary variables selected for inclusion? If the latter is the case, how should I specify the variables in my Mplus program? Finally, is FIML as good as multiple imputation (MI) in handling missing data? If so, would you please suggest some references that compare the two methods of missing data treatment? Thanks.
Standard maximum-likelihood under MAR is what some people call FIML (although you won't find that word in the classic missing data book by Little & Rubin).
"FIML" does not add the auxiliary variables.
Auxiliary variables have to be chosen based on theory and previous experience. You make the choice. See Psych Meth articles by Graham and Collins. See chapter 11 examples for how to specify aux variables in Mplus.
Yes, "FIML" is as good as MI. MI can use more variables than the analysis variables for imputation, and ML can add them via aux. See references in the Mplus UG on missing data.
Mplus can also do NMAR modeling; see
Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33.
A good, applied missing data book is Enders (2010), referred to in the Mplus UG.
una posted on Wednesday, October 30, 2013 - 9:05 am
Dear Prof. Muthen, I am running a latent cross lagged model (two waves, on each wave a latent factor and a second-order latent factor). We use the MLR estimator. 498 respondents participated in follow-up 1, and 422 responded at follow-up 2. I did not specify ‘listwise = on’. However, when I run my analysis the results are only presented for 308 respondents, with the warning “Data set contains cases with missing on x-variables. These cases were not included in the analysis.” Is it not possible to run this analysis with Mplus on all respondents? Thank you very much in advance, Kind regards, Caroline
Missing data theory does not apply to observed exogenous variables. You would need to mention the variances of all observed exogenous variables in the MODEL command to avoid the deletion of these cases. When you do that, they are treated as dependent variables and distributional assumptions are made about them.