Multiple Imputation PreviousNext
Mplus Discussion > Missing Data Modeling >
 Graham Rifenbark posted on Tuesday, November 30, 2010 - 11:11 am
Hello, I am trying to use the Montecarlo command to impute 500 missing data files, 100 times each.

Is there a way to have Mplus impute the first data set and output those files in its own folder, and do the same for the 2nd missing data set and output to its own folder, and continue on for all 500 files.

In the end i would then like to analyze all 500 folders at once.

Any ideas?
 Linda K. Muthen posted on Tuesday, November 30, 2010 - 2:54 pm
The MONTECARLO command does not impute missing data files. It generates data according to population parameter values for certain specified sample sizes.

If you want to create imputed data sets, use the DATA IMPUTATION command. These data sets can be analyzed in the same run where they are created or saved for later analysis. You can specify the directory in which they are saved.
 Graham Rifenbark posted on Wednesday, December 01, 2010 - 8:18 am

Thank you for responding. I think i was a little unclear in my question, sorry for that.

First I created a Full data set using the MONTECARLO command, with specific model population settings.

I then Imposed missingness by using the model missing statement. For 500 repetitions.

Now i would like to run both Multiple Imputation (100 imputations on each missing data set) and FIML on all 500 reps.

Most recently i have been using the following code:
Data Imputation:
impute = g1-g6;
save = C:\Users\grahamr\Desktop\Simulation Study\Multiset\rep10\imp*.dat;


If i use this code i will have to run it 500 times. Is there a better way?

Thanks again.
 Linda K. Muthen posted on Thursday, December 02, 2010 - 10:00 am
See Using Mplus Via R under How-To on the website. This may help.
 Graham Rifenbark posted on Friday, December 03, 2010 - 10:00 am

I have written out code in MPlus, that creates .inp files for MPlus, that will do the imputations. In my R code I have written in a loop that will go through all of my reps. However there is a breakdown between Mplus and R. I keep getting errors in MPlus, Not R. Also when I open the file in Mplus it runs correctly. Have you ever come across this problem?

 Linda K. Muthen posted on Sunday, December 05, 2010 - 11:18 am
If each Mplus file works separately, then I think the problem must be in your R code.
 Michael Green posted on Monday, March 28, 2011 - 2:00 am

I have a question about using categorical variables in data imputation under an unrestricted H1 model (where all variables are treated as Y outcomes). In terms of predicting values for the categorical variables, are they treated as ordered categorical or nominal variables? And if they are treated as ordered categorical, is there a way of performing such multiple imputations with nominal variables?

Thanks, MG
 Linda K. Muthen posted on Monday, March 28, 2011 - 6:42 am
They are treated as ordered categorical variables. Imputation is not available for nominal variables.
 Stephanie posted on Wednesday, February 05, 2014 - 5:39 am
I have two questions regarding multiple imputation. My model uses the WLSMV estimator and I also included sample weights.

1. After running the imputation with 20 imputed datasets in a TYPE=IMPUTATION, I am now interested in the correlations between all variables in the model. Therefore, I added ‘analysis: type=basic’ with no MODEL command. Besides ‘normal’ correlation values, I get implausible S.E. values (999.000 and 8179.653) for both dichotomous exogenous variables. I have checked the data files but cannot find any inconsistencies. Do you have any suggestions what might cause this problem?

2. After running the imputation I am also interested in total effects. So I used the 'model indirect' command but only got the error message ‘MODEL INDIRECT is not allowed with TYPE=IMPUTATION’.
Is there any other possibility to get them?

Thank you very much in advance.
 Linda K. Muthen posted on Wednesday, February 05, 2014 - 1:47 pm
1. Please send the output and your license number to

2. You would need to use MODEL CONSTRAINT in this case.
 Stephanie posted on Wednesday, February 12, 2014 - 6:59 am
Thank you very much. I have just used MODEL CONSTRAINT and did well to get unstandardized total effects. Now, I need to calculate the standardized effects. To do so I had a look on Example 5.20 in the user guide. But I am not sure if I adapted the example correctly. Therefore, may I ask:
1. Assume x1 influences y directly but also via x2. Is it correct to calculate the standardized coefficient for each path involved (formula: stdxy = b*(sd(x)/sd(y)) and then to integrate these coefficients obtained in the formula 'Total= indirect x1x2 * indirect x2y + direct x1y'?

Translated into the input this would be:

2. If that is correct, would I use STDY=b/SD(y) if the one path involves an independent binary variable?
3. As my model contains binary variables as well, I cannot calculate their variance. When using PARAMETERIZATION = THETA; I do not get any results. What could I do in this case?

I thank you very much for your support!
 Stephanie posted on Thursday, February 13, 2014 - 11:24 am
I have found a solution for the first and third question. Sorry if these questions were too simple. But one question remains:
When calculating standardized total respectively indirect effects, how can I deal with a path that includes a binary independent variable among other paths with continuous dependent and independent variables? Is it allowed calculate STDY*STDYX*STDYX, for example?
 Bengt O. Muthen posted on Friday, February 14, 2014 - 11:44 am
Stephanie post I

1.I don't think that is correct. Instead you want to compute the total effect from the unstand coeffic's and then standardize that.

2.Yes, use STDY=b/SD(y) if the one path involves an independent binary variable.

3. If you have binary DVs and Theta parameterization, the residual variances are 1.
 Bengt O. Muthen posted on Friday, February 14, 2014 - 11:48 am
Stephanie post II

If the binary variable is the x variable in the mediation model


then the product indirect effect is ok. If the binary variable is either m or y, special considerations are needed.
 Stephanie posted on Monday, February 17, 2014 - 6:10 am
Thank you very much for your reply. I am not sure if I understood your answer correctly. You would first calculate the total effects with the unstandardized coefficients and then standardize that total effect obtained? I am not sure by which formula this could be done. Could you please give me some advice?

And if I first have to calculate the total effect by unstandardized coefficients and then standardize it, I am not sure to what extent the answer to my second post is relevant. Couldn’t I just simply use the standardized coefficients obtained (STDYX and STDY if the covariate is binary) and calculate the total effects just as usual by the product of the indirect effects + the direct effect? But then, if I haven't misunderstood your second post, a problem might be that my model really does not only contain binary variables in x but also in m. Additionally, y is binary. Therefore, I am already using WLSMV. Which considerations have to be made in this case?
 Bengt O. Muthen posted on Monday, February 17, 2014 - 2:34 pm
Paragraph 1:

You use the regular standardization formula on the total unstand'd value, namely divide by the SD of the DV and multiply (unless binary IV) by the SD of the IV.

Paragraph 2:

Not sure where we miscommunicate here. When the indirect effect is computed as a product, you use the product of unstandardized values, not standardized values. You can then standardize that product using the answer for Paragraph 1.

I am also saying that If the binary variable is either m or y, special considerations are needed. With WLSMV such indirect effects are considering continuous latent response variables, not the corresponding binary observed variables. For more information, see

Muthén, B. (2011). Applications of causally defined direct and indirect effects in mediation analysis using SEM in Mplus. Click here to view the Technical appendix that goes with this paper and click here for the Mplus input appendix. Click here to view Mplus inputs, data, and outputs used in this paper.
 Stephanie posted on Tuesday, February 18, 2014 - 6:58 am
Thank you very much, Prof. Muthén, for your detailed answer. Maybe my confusion is with the terms total effect and indirect effect. At the moment I am solely interested in calculating total effects. So your answer regarding Paragraph 1 should be relevant for me (calculating total effects by unsandardized coefficients and then standardize it). And this is also correct if I am using WLSMV?
 Bengt O. Muthen posted on Tuesday, February 18, 2014 - 6:08 pm
Yes, where the same qualification as before holds also for the total effect:

I am also saying that If the binary variable is either m or y, special considerations are needed. With WLSMV such indirect effects are considering continuous latent response variables, not the corresponding binary observed variables. For more information, see...
 Ashley posted on Friday, November 14, 2014 - 11:21 am
Are imputed values for one variable used to impute other values during multiple impuation? For example, would imputed values of variable x1 be used in imputing the values of variable x2 in the same imputation model?

Thank you in advance.
 Bengt O. Muthen posted on Friday, November 14, 2014 - 1:10 pm
 Bo Zhang posted on Wednesday, January 07, 2015 - 1:06 am
Hello Dr.Muthen,

I am trying to do a Bayesian multiple imputation. However, I always got the warning that "THERE IS NOT ENOUGH MEMORY SPACE TO RUN Mplus ON THE CURRENT INPUT FILE." I have tried it on several computers(enough memory space) with both the 32-bit and the 64-bit version. But I still could not get the imputed file. However, it ran normally if I deleted the auxiliary variables. Here below is my syntax.

DATA: FILE IS "ProblemSolvingSEMParceled.dat";

NAMES are Gender Age Group v1-v15;
USEVARIABLES are Gender Age Group v1-v15;
Auxiliary are v1-v5(m);
Missing is ALL(999999);

Data imputation:

Impute = v11-v15;
Ndatasets = 10;
Save = missingimp*.dat;




What's the problem?

 Bengt O. Muthen posted on Wednesday, January 07, 2015 - 5:49 pm
Please send input, output and data to Support with your license number.
 Seamus Harvey posted on Tuesday, February 20, 2018 - 7:45 am
Hello. I wish to create 14 datasets using multiple imputation. If I specify FBITERATIONS = 35000, does this mean that iterations will be made after every 2500th imputation (in other words, will each imputed dataset be created after every 2500 imputations)? Or is the THIN command needed?
 Tihomir Asparouhov posted on Tuesday, February 20, 2018 - 2:17 pm
No. Mplus generates imputed data sets after the imputation model converges. Setting FBITERATIONS = 35000; refers to the convergence process for the imputation model and essentially tells Mplus to run exactly 35000 MCMC iterations and assume that the model has converged.

The commands that control the imputation are given like that

SAVE = a*.dat;
 Seamus Harvey posted on Tuesday, February 20, 2018 - 3:52 pm
Thank you very much for your reply. Just one more question ... Example 11.8 in the users guide demonstrates multiple imputation using a two level factor model with categorical outcomes followed by the estimation of a growth model. In this example, is multiple imputation completed using a multilevel analysis approach, or does this only apply to the growth model?
 Bengt O. Muthen posted on Tuesday, February 20, 2018 - 4:01 pm
The first step, multiple imputation, is done using a two-level factor model - the Model statement shows the %Within% and %Between% parts of the factor model. Page 461 of the V8 UG shows the second step which is a twolevel growth model. So the imputation and analysis steps are in sync in the sense that they are both twolevel.
 Seamus Harvey posted on Tuesday, February 20, 2018 - 4:22 pm
Thank you. If the factor model wasn't specified in the first step, would multilevel multiple imputation still occur?
 Bengt O. Muthen posted on Tuesday, February 20, 2018 - 4:34 pm
Yes, if Type = Twolevel was specified in the Analysis command. But such imputation, which use an unrestricted two-level model, can be difficult to carry out. The more restricted two-level factor model is therefore illustrated in ex 11.8.
 Daniel Lee posted on Friday, August 16, 2019 - 12:16 pm
Hi Dr. Muthen,

If we are using the built in multiple imputation method in Mplus, is it important that we transform independent variables that are not normally distributed (e.g., income has a right skew) since these variables are (during the imputing process) going to be dependent variables predicted by all other variables?

We have a few covariates that have a lot of missing values (25%-50%), and these covariates are also skewed. I am just wondering if we should check the distribution of each variable and transform those that are not normally distributed. I am also just conducting a linear regression model with an interaction term (2-way).

Thank you.
 Bengt O. Muthen posted on Friday, August 16, 2019 - 5:47 pm
I don't think you have to worry about transformations. But watch out for strong floor or ceiling effects.
 Michael Strambler posted on Friday, August 23, 2019 - 4:44 pm
I have a situation where I'd like to impute data in stages as in imputing T2 data then using those imputed datasets to impute T1 data. Is this possible? That is, can I do MI using multiply imputed datasets? Thank you.
 Bengt O. Muthen posted on Saturday, August 24, 2019 - 6:24 am
I don't see the advantage of such a 2-step imputation procedure over doing it in 1 step.
 Michael Strambler posted on Saturday, August 24, 2019 - 6:54 am
Just realized this may not be as clear as I intended. I have a dataset with T1 and T2 data represented and I'd like to first impute T2 data only using T2 data, then impute T1 data using the newly imputed T2 data plus the T1 data. Is this possible in Mplus and could you advise on how I might go about doing this? Thank you.
 Michael Strambler posted on Saturday, August 24, 2019 - 7:03 am
Thanks for the response. I sent the above post before realizing you had responded.

The rationale for doing it in two steps is that we have more data for informing imputation at T2. T1 has substantially less. Our thinking is that we'd get more accurate estimates by first imputing where we have the greatest amount of info and then using that to impute the rest. We'd like to look at result for 1 step versus 2 steps to see which performs better before proceeding to analysis. But maybe this rationale is flawed?
 Tihomir Asparouhov posted on Monday, August 26, 2019 - 10:04 am
A two stage imputation can be worse than a single stage imputation if in the first stage you use only the T2 data, when compared to the one stage imputation that would use all the data.

If the first stage is based on all the data there is still no advantage. The single stage imputation would do exactly what you are describing as a two stage - it would impute T2 then impute T1. In Mplus you can impute only some of the variables and organize a two stage imputation but it is not recommended. Each imputed value would be obtained from the posterior distribution conditional on all observed data. Either method would use the same posterior distribution so there would be no advantage.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message