Mplus Discussion >> Does Mplus impute missing data/ role of EM?

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Does Mplus impute missing data/ role ...

Mplus Discussion > Missing Data Modeling >

Message/Author

Calvin Croy posted on Wednesday, October 05, 2005 - 12:19 pm

Could you please answer the following 3 questions?

I am running Mplus version 3.13 for a CFA analysis and specifying Type = Missing H1.

1. After reading the User's Guide and examining posts on this discussion site, it appears to me that Mplus does NOT actually impute ("fill in") missing data values when Type = Missing H1 is specified. Is this correct?

2. Based on what I've read, it sounds like Mplus uses the EM algorithm to estimate means, variances, and covariances (the sufficient statistics) using all the factor indicator variables. Then the estimated means, variances, and covariances resulting from EM are used to derive the model parameter estimates. Is this right? If not, could could you please clarify?

3. If one specifies Type = Imputation to read multiple datasets created by some multiple imputation process outside of Mplus (Example 12.13), how is EM used, if at all?

Thanks so much for answering posted questions! While I'm sure this is quite time consuming, your comments are valued greatly and keep us users on the right path.

bmuthen posted on Saturday, October 08, 2005 - 2:15 pm

1. That's right. Standard ML estimation is used instead. The missing values can however be produced.

2. Not quite. EM is used in the ML estimation of the unrestricted mean vector and covariance matrix (the H1 model) - but these results are only used to be able to compute a chi-square test of the H0 model against H1. For the H0 parameter estimation, all available raw data are used in the computations using ML. EM is not used in the H0 computations.

3. EM is not used here. Type = Imputation is intended for use when another program has been first used to impute missing data (e.g. freeware such as NORM). These multiple data sets are then sent to Mplus and analyzed by Mplus, followed by an Mplus summary of the parameter estimates and the computation of the SEs.

Just to add to this - one can use EM to generate imputed data. In the EM algorithm the missing data are computed for each person. This type of output is however not currently available in Mplus.

Calvin Croy posted on Thursday, October 13, 2005 - 11:51 am

Thank you so much for your clarification. It really helped!

ting hlin posted on Thursday, November 20, 2008 - 3:35 am

is it true that Mplus 5.1 still does not have the facility to carry out multiple imputation (it can just processes imputed data)?

Linda K. Muthen posted on Thursday, November 20, 2008 - 7:13 am

Yes, this is true.

Nicolas Müller posted on Wednesday, March 16, 2011 - 10:57 am

Dear Dr. Muthen,

I'm using MPlus 6, and I'd like to understand how missing data on the outcomes are treated in latent growth models when using ML/MLR estimators.

1) I think I've read that missing data are considered MAR. Does it mean that, in a longitudinal context, missingness for one indicator in time can be conditional on the values of the same indicator at other time points?

2) Is there a technical article where the algorithm used in MPlus for MAR treatment of missing data within a ML estimation is explained? Or a general reference?

Thank you.

Bengt O. Muthen posted on Wednesday, March 16, 2011 - 4:40 pm

1. Yes, missing data on an indicator at a certain time point is allowed to be influenced by the indicator value at other time points.

2. The Little & Rubin )2002) missing data book describes this and is a good general reference. A more applied book is Enders (2010).

Nicolas Müller posted on Thursday, March 17, 2011 - 1:26 am

Thank you for these answers.

I also tried to estimate a multi-level model on my growth data, thus using a long format where each line is an individual observation and ANALYSIS=TWOLEVEL.

Does it mean that missing data in a twolevel model for growth are not considered MAR but MCAR because each line where the dependent variable is missing is removed?

If I get almost very close estimates when using a latent growth curve analysis or a multilevel analysis, can I say missing data are MCAR (because considering them MAR doesn't change the estimates)?.

Linda K. Muthen posted on Thursday, March 17, 2011 - 8:53 am

The results should be identical if you set the models up correctly. You may be missing the fact that the residual variances need to be held equal over time for the Mplus model to be the same as the multilevel model. Both assume MAR.

Nicolas Müller posted on Friday, March 18, 2011 - 8:04 am

Ok, your answer made me realize there was a glitch in my data transformation routine.
With the correct datasets, the estimates are now completely identical, using the latent growth or the multilevel specification.

Thank you very much.

Nicolas Müller posted on Monday, March 28, 2011 - 2:11 am

There is still something I don't get.
I'm fitting the same model with a latent growth curve or a multilevel specification, and I get exactly the same results.

I don't understand how it is possible, knowing that some of the time-varying covariates have been imputed, regardless if the individual was still followed or not.

In the multilevel spec, with a long data format, it has no consequences, because the lines where the individual was not in the study are deleted.
But in the wide format used by the latent growth specification, these imputations are still in the data, so I thought they should be impacting the estimation... which is not the case.

I hope I made myself clear. Do you have an explanation for this behaviour?
I could send you my models if necessary.

Linda K. Muthen posted on Monday, March 28, 2011 - 10:16 am

In Mplus the model is estimated conditioned on the covariates. Cases with missing data on observed exogenous variables are deleted from the analysis as they are in HLM. With dependent variables, values are not imputed. All available information is used as in HLM.

Soz posted on Wednesday, August 10, 2011 - 7:07 am

Dear Linda and Bengt,

I am running a SEM with 5 imputed data sets for missing values within PASW. I also created the implist.dat where the five datasets are identified.
I used the following specification:

FILE IS Implist.dat;

VARIABLE:
NAMES ARE Imputation_ VPCode country RiskP_M3 OE_M6 Inten_M5 SE_M6 Plan_M2
Plan_M6 CHBdiet9 ;

USEVARIABLES ARE RiskP_M3
OE_M6 Inten_M5 SE_M6 Plan_M2
Plan_M6 CHBdiet9;

ANALYSIS:
TYPE is Imputation;
ESTIMATOR IS ML;
ITERATIONS = 10000;
CONVERGENCE = 0.00005;

Model:
Inten_M5 on RiskP_M3 OE_M6 SE_M6 CHBdiet9 ;
Plan_M2 on Inten_M5 ;
Plan_M6 on Plan_M2;

OUTPUT: tech1; standardized;

Nevertheless i`ve got this error message:

*** ERROR in ANALYSIS command
Unrecognized setting for TYPE option:
IMPUTATION

What did I wrong? How can I fix the problem? Many Thanks in advance.

Linda K. Muthen posted on Wednesday, August 10, 2011 - 8:28 am

TYPE+IMPUTATION; goes in the DATA command not the ANALYSIS command.

C posted on Tuesday, April 03, 2012 - 1:45 pm

Hi Dr. Muthen,
I am attempting to create some imputed datasets. I have done this before more than once. However, at this time, I keep getting the following message with no other explanation:

"INPUT READING TERMINATED NORMALLY

*** FATAL ERROR
PROBLEMS OCCURRED DURING THE DATA IMPUTATION."

I cannot figure out what the issue is. Any insight into what this messege means and how to rectify it?

Linda K. Muthen posted on Tuesday, April 03, 2012 - 4:14 pm

Please send the output and your license number to support@statmodel.com

EFried posted on Tuesday, June 19, 2012 - 1:11 pm

I have a question about #6 of the technote7:

"A basic identifiability requirement for the imputa- tion model is that for (A) each variable in the imputation the number of observations should be at least as many as the number of (B) variables in the imputation model."

I don't understand the difference between the variables I marked with (A) and (B). Does (A) refer to the variables in the usevariables list, and (B) to the variables in the "impute" list?

EFried posted on Tuesday, June 19, 2012 - 1:15 pm

(To clarify, in my example, I have 6 variables I use to help impute missing values on one variable, and wonder how much % may be missing to identify the imputation. N=800)

usevariables=x1 x2 x3 x4 x5 x6;

DATA IMPUTATION:
Impute = x0;
NDATASETS = 10;
SAVE = impute_M*.dat;

ANALYSIS:
type = basic;

Tihomir Asparouhov posted on Tuesday, June 19, 2012 - 2:36 pm

In your example X0 should have at least 7 non-missing values since there are 7 parameters in the imputation model.

EFried posted on Tuesday, June 19, 2012 - 3:24 pm

So in theory, 793 values out of 800 could be missing? That doesn't sound like something I could report in a paper ;)

We have a large number of missing values on one crucial covariate, which MPLUS cannot handle in a multilevel analysis if we just add it as
x1;
into the model (non-convergence).

And since MPLUS does listwise deletion for missing time-varying covariates in Multilevel Models (even if only 1 measurement point misses - why is that?), we were thinking to impute. Imputation converges, but we are not sure if imputation with 40% missing is feasible.

Thank you for your insight

Linda K. Muthen posted on Wednesday, June 20, 2012 - 10:45 am

Please send the output that shows the problem and your license number to support@statmodel.com.

Keke Hiller posted on Monday, November 19, 2012 - 1:54 am

Dear Dr. Muthen,
I am using MPlus Vers6 & survey data of three waves in a panel design, having solely the dependent variable at time point 3.
My sample population is 207 at time point 1 (T1), then reduced to 176 individuals at time point 2 (T2) and finally reduced to 137 at time point 3 (T3).
The calculations, however, are performed with N=176. Thus, I assume the missing data at T3 is imputed, right?

Is there a critical proportion of missing data where imputation is problematic? Do you know any paper that discusses this topic?

Thank you in advance!

Linda K. Muthen posted on Monday, November 19, 2012 - 10:37 am

In Mplus, the default is to estimate the model using all available information. It does not impute values in this case. See the Little and Rubin and Enders books in the user's guide's reference list.

Keke Hiller posted on Monday, November 19, 2012 - 11:20 am

Thank you for the fast reply. I will look the topic up in the book you mentioned.

Nevertheless, I am wondering that if there is no imputation done in this case, why does MPlus gives me N=176 as sample population? (and not 137 instead)

How would I report my sample size in my paper?

Thank you so much for your help!

Linda K. Muthen posted on Monday, November 19, 2012 - 12:06 pm

All available information from the full sample is used in the model estimation so the sample size is the full sample. You report what is given in the output.

Pia Kreijkes posted on Wednesday, January 16, 2019 - 4:45 am

Hi,

In October 2005 you wrote "one can use EM to generate imputed data. In the EM algorithm the missing data are computed for each person. This type of output is however not currently available in Mplus."

Is this still the case or can I now implement EM to impute missing values and also save the imputed data file?

Thanks

Bengt O. Muthen posted on Wednesday, January 16, 2019 - 2:08 pm

See our Multiple Imputation examples in the Version 8 User's Guide on our website, Chapter 11, starting with ex11.5.

Pia Kreijkes posted on Wednesday, January 16, 2019 - 2:21 pm

Thank you for this. Is the Bayesian approach equivalent to expectation maximisation for imputing missing values? So I could simply impute a single data set instead of multiple and would get imputed values similar to those that single imputation using the EM algorithm would give me?

Bengt O. Muthen posted on Wednesday, January 16, 2019 - 3:27 pm

The question is if your goal is to get values for the missing data or parameter estimates for a model where the data has missing values.

Bengt O. Muthen posted on Wednesday, January 16, 2019 - 3:31 pm

If you want the latter, the former is not necessary. ML under MAR ("FIML") or Bayes can handle the latter. EM is an ML algorithm than can be used for FIML. One model is the covariance matrix - EM is often used to estimate that with missing data. But if you want to use the covariance matrix to estimate parameters for a model describing the covariance matrix - such as a factor model - you don't need to first estimate the covariance matrix,

Pia Kreijkes posted on Thursday, January 17, 2019 - 12:40 am

I would like to get values for the missing data. I considered many different methods for handling missing data such as FIML but concluded that it will be best for me to first fill in my missing values and then start my analyses from there. Multiple Imputation would not work as I have analyses along the way that won't be able to deal with multiple data sets and combining results of many single data sets would be a big hassle. I have only about 2 percent of responses missing, so I feel a single imputation method would not be a big problem. So imputing a single data set using BAYES in the MI framework of MPLUS would be OK only that one would be concerned with the underestimation of error I assume? That should be similar to single imputation with EM?

Bengt O. Muthen posted on Thursday, January 17, 2019 - 4:15 pm

With only 2% missing, I would use FIML.

Pia Kreijkes posted on Friday, January 18, 2019 - 12:37 am

I need to use WLSMV as an estimator in my models which is why FIML does not work for me. Hence, I'm looking for a way to impute a single data set im MPLUS, preferably using EM or a valid alternative. Can I do that with the BAYES estimator, setting the number of imputed data sets to 1? Thanks

Bengt O. Muthen posted on Friday, January 18, 2019 - 7:04 am

Bayes is great when ML needs too heavy numerical integration (I assume this is why you used WLSMV). You can do 1 imputation using Bayes. I would, however, instead recommend handling the missing data by just using Bayes directly to estimate your model instead of first imputing data. Bayes is "FIML-like" in that it also uses all available data and gives correct estimates under MAR.

Pia Kreijkes posted on Friday, January 18, 2019 - 12:10 pm

I used WLSMV because I have ordinal data which is quite skewed and when I used MLR and specifying data as categorical for my models, no fit indices were provided. Thank you very much for your help and recommendations!

Bharath Shashanka Katkam posted on Thursday, June 27, 2019 - 6:34 am

Hello Mplus Team,

In the Two-level Regression, it is shown that, there is a usage of the Expectation-Maximization algorithm & Steepest Descent iterations.
It is understood that the "EM algorithm" can perform both "Ascent related iterations" & "Descent related iterations" to obtain the Local Optima.
Even though the "EM algorithm", perform the "Descent related iterations", what is the reason to incorporate the "Steepest descent iterations" along with the EM algorithm?

Bengt O. Muthen posted on Thursday, June 27, 2019 - 5:01 pm

In initial iterations, Steepest Descent can in some cases get you towards the optimum faster than EM.

Bharath Shashanka Katkam posted on Friday, June 28, 2019 - 2:09 am

Thank you so much, Sir.

Bharath Shashanka Katkam posted on Friday, June 28, 2019 - 7:28 am

Hello Mplus Team,

In the Two-level Regression, the "Expectation-Maximization Algorithm" is used to analyze the "H1 Model", whereas the "MLR" is used to analyze the "H0 Model".
Can't we use the "EM algorithm", to analyze both the "H1 Model" & "H0 Model"?

Bengt O. Muthen posted on Friday, June 28, 2019 - 3:29 pm

The answer is yes. Note that ML/MLR is an estimator while EM is an algorithm among many to get ML estimates. Mplus uses different algorithms depending on the situation.