Mplus Discussion >> FIML vs. multiple imputation

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


FIML vs. multiple imputation

Mplus Discussion > Missing Data Modeling >

Message/Author

Sung Joon Jang posted on Thursday, October 04, 2012 - 6:37 am

I have questions regarding Mplus' handling of missing data. According to Asparouhov and Muthen's (2008) technical report, "Standard maximum likelihood" generates unbiased estimates if missing data are MCAR or MAR, "but will lead to biased estimates if the missing data is NMAR ... Facilitaing the additional information that auxiliary variables provide can enable us to reduce or eliminate such bias." First, is the "[s]standard maximum likelihood" FIML? Or, is using auxiliary variables predicting missing data what FIML does? Second, how are the auxiliary variables selected and included in the estimation? Does Mplus does it or am I supposed to do it? If former is the case, how are auxiliary variables selected for inclusion? If the latter is the case, how should I specify the variables in my Mplus program? Finally, is FIML as good as multiple imputation (MI) in handling missing data? If so, would you please suggest some references that compare the two methods of missing data treatment? Thanks.

Bengt O. Muthen posted on Thursday, October 04, 2012 - 11:55 am

Standard maximum-likelihood under MAR is what some people call FIML (although you won't find that word in the classic missing data book by Little & Rubin).

"FIML" does not add the auxiliary variables.

Auxiliary variables have to be chosen based on theory and previous experience. You make the choice. See Psych Meth articles by Graham and Collins. See chapter 11 examples for how to specify aux variables in Mplus.

Yes, "FIML" is as good as MI. MI can use more variables than the analysis variables for imputation, and ML can add them via aux. See references in the Mplus UG on missing data.

Mplus can also do NMAR modeling; see

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33.

A good, applied missing data book is Enders (2010), referred to in the Mplus UG.

una posted on Wednesday, October 30, 2013 - 9:05 am

Dear Prof. Muthen,
I am running a latent cross lagged model (two waves, on each wave a latent factor and a second-order latent factor). We use the MLR estimator. 498 respondents participated in follow-up 1, and 422 responded at follow-up 2. I did not specify ‘listwise = on’. However, when I run my analysis the results are only presented for 308 respondents, with the warning “Data set contains cases with missing on x-variables. These cases were not included in the analysis.” Is it not possible to run this analysis with Mplus on all respondents?
Thank you very much in advance,
Kind regards,
Caroline

Linda K. Muthen posted on Wednesday, October 30, 2013 - 10:18 am

Missing data theory does not apply to observed exogenous variables. You would need to mention the variances of all observed exogenous variables in the MODEL command to avoid the deletion of these cases. When you do that, they are treated as dependent variables and distributional assumptions are made about them.

Hadar Nesher Shoshan posted on Thursday, May 28, 2020 - 5:17 am

Dear Prof. Muthen,
I have some general questions regarding my understanding of FIML in MPlus and how to determine which way of dealing with missing data is most appropriate.
I saw datasets where mentioning the variances in the model command (i.e. FIML) lead to worse fit compared to only using valid cases and also data where this was not the case.
When the fit becomes subtantially worse by mentioning the variances, would this indicate MNAR? Or can this be due to large amounts of missing data? Or else, what would this indicate? Or does it indicate different things for different indexes?
Furhter, if results (Path-Estimates)are similiar when only using valid cases and when using "FIML", what would this indicate? I am searching for some guideline how to determine if onle using valid cases, FIML or even multiple imputation is the most appropriate procedure.

Hadar Nesher Shoshan posted on Thursday, May 28, 2020 - 5:20 am

Another related question: As I understood it, Mplus can use ML (or MLR) to estimate. Therefore, is it not a bit misleading to call it FIML, when the variances are additionally mentioned? I mean with and without mentioned variances the estimation algorithm can be the same (i.e. maximum likelihood), the only difference is on which data-basis the algorithis applied?

This brings me back to the idea that it matters if data misses at random or not. A related question for me would be if it is possible to enter the "missingness" on the outcome variables in the model as additional categorical outcome variables (for instance, "1" for not missing and "0" for missing) to investigate missingness is random or dependent on some of the predictors already in the model?
I saw the recorded presentation at Johns Hopkins University (March 23, 2010), ( https://www.youtube.com/watch?v=c5upOIW1su8&feature=youtu.be), and also informed myself a bit about auxilary variables, but I have the impression that auxilary variables are simply covariates which are not part of the "theoretical model". Therefore, they would not do this job.
Im very happy for help.

Bengt O. Muthen posted on Saturday, May 30, 2020 - 11:26 am

Mentioning the variances does not usually cause worse fit. You can send your output to Support if you are seeing this - there is probably another explanation. It does not indicate MNAR.

ML and MLR are "FIML" based on the DVs that are included in the model - when you mention variances, those variables become DVs in the eyes of Mplus.

Auxiliary variables are DVs that are not part of the theoretical model but can still help when dealing with missing data.

Yes, you can model missingnes - this is shown in our paper on our website

Muthén, B., Asparouhov, T., Hunter, A. & Leuchter, A. (2011). Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychological Methods, 16, 17-33. Click here to view Mplus outputs used in this paper.
download paper contact first author show abstract

You may also be interested in our book Regression and Mediation Analysis Using Mplus in which Chapter 10 gives a thorough discussion of the issues you raise.