Mplus Discussion >> Multiple imputation

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Multiple imputation

Mplus Discussion > Missing Data Modeling >

Message/Author

Shige posted on Saturday, April 24, 2004 - 3:23 pm

Dear All,

I am trying to do a SEM survival model where some of the covariates in the measurement model have some heavily missing data (20%). Multiple imputation seems to be the best choice in this case.

Based on my reading of the Mplus 3 user guide, Mplus does not have the facility to carry out multiple imputation, but it can process imputed data (example 12.13). In that case, can anybody share their experience about which multiple imputation software to use to work with Mplus? I know there is large body of literature of multipe imputation, I am a little lost...

Thanks!

Linda K. Muthen posted on Saturday, April 24, 2004 - 3:47 pm

Joe Schafer's NORM program is probably one way to get imputed data. I believe that it is freeware. Schafer is at Penn State in the Statistics Department.

Shige posted on Sunday, April 25, 2004 - 2:12 am

Thanks Linda, that seems to be a good place to start. I also find that Gary King has a program called "amelia" that does similar things, and it seems to be able to hand non-normal data pretty well (which NORM is not designed to handle).

Anonymous posted on Sunday, April 25, 2004 - 6:36 pm

btw, consider also modeling your missing data variable, if your variable is categorical you can model it with a mixture model, you can also do this with ordered/unordered categorical without mixture

Shige posted on Monday, April 26, 2004 - 10:26 am

Dear Anonymous,

Can you point me to a example? Thanks!

Shige

Anonymous posted on Monday, April 26, 2004 - 9:54 pm

the mixture approach is described in the CACE papers
http://statmodel.com/mplus/examples/penn.html#jo
http://statmodel.com/references.html#catlatent
and ex7.24

the approach without mixture is basically a regular path analysis, add type=missing and integration=montecarlo to example ex3.12

btw, both of these methods should produce the same results as multiple imputations

Shige posted on Monday, May 03, 2004 - 11:18 pm

Thanks, it's very helpful.

Anonymous posted on Monday, May 24, 2004 - 6:07 pm

Hello,

In my data, there are 7% missing data for one variable, 1%-3% for 4 variables, and about 9% for 5 covariates. All these variables would be included in my MIMIC model. Based on 3 of that 5 covariates with 9% missing rate, I would extract 3 subpopulations (the 3 subpopulation may have lower missing rate for other variables). All analysis would be implemented on these 3 subpopulations.

Then, can I ignore the missing data or do I need to run imputation to replace them? If I need to impute, which is better -- impute before subpopulation extraction or after?

Thank you.

Anonymous posted on Monday, May 24, 2004 - 6:14 pm

Linda K. Muthen posted on Tuesday, May 25, 2004 - 8:19 am

Are your dependent variables continuous or categorical?

Anonymous posted on Tuesday, May 25, 2004 - 9:25 am

My dependent variables are 3 levels of ordered responses (1-3).
Thank you.

Anonymous posted on Tuesday, May 25, 2004 - 2:07 pm

Linda,
My dependent variables are 3 levels of ordered responses (1-3). In that case, do I need to impute missing data?
Thank you.

bmuthen posted on Tuesday, May 25, 2004 - 8:05 pm

You might want to do multiple imputations to handle the missing data on the covariates and then do modeling of the categorical outcomes taking missing data on the outcomes into account.

Alan C. Acock posted on Sunday, August 29, 2004 - 3:19 pm

On page 308 of the User's Guide to Version 3.0 it says "Multiple data sets generated using multiple imputation (Schafer, 1997) can be analyzed using a special feature of Mplus."

What is this special feature?

Alan Acock

Linda K. Muthen posted on Sunday, August 29, 2004 - 4:03 pm

See Example 12.13 and the IMPUTATION option of the DATA command.

Chris Richardson posted on Friday, November 05, 2004 - 11:37 pm

Hi Linda/Bengt,
I'm using MPlus V3 to conduct several EFA's and CFA's on 2 datasets. The variables are all ordinal (4pt likert) and contain from 3 to 10 % missing data. When running the EFA's while treating the data as categorical, I included the line TYPE = MISSING -- Question 1) I was wondering what method MPlus was employing to deal with the missing data in this situation?

As for the CFA's, I have been using NORM to create multiple imputed data sets which I then use in MPlus via the IMPUTATION option - this works fine. Question 2) Does this approach seem reasonable or is there an easier way to deal with the missing data without using NORM?

Thanks for your time - cheers
chris

bmuthen posted on Saturday, November 06, 2004 - 5:33 am

With EFA and categorical variables, least-squares estimation is used and missing data is simply handled by what amounts to pairwise present data.

With CFA and categorical you may also use maximum-likelihood - at least if you don't have too many factors so the numerical integration is feasible - and then the usual approach of ML under MAR assumptions is used. But using the Imputation approach you mention should be fine.

Chris Richardson posted on Sunday, November 07, 2004 - 5:23 pm

Thanks Bengt!

cheers
chris

Anonymous posted on Friday, December 31, 2004 - 7:23 am

Hi. Is there a conflict between multiple imputation analysis and categorical variables declared in the VARIABLES section of the code?

A model with categorical items runs without error in each of the individual imputed datasets, but none converge when using the TYPE=IMPUTATION command to run them all at once.

Thanks!

Shige Song posted on Friday, December 31, 2004 - 8:24 am

Also, Stata has a set of user contributed routines to generate multiply imputed data set. Try "findit imputation" in the command prompt.

Linda K. Muthen posted on Friday, December 31, 2004 - 10:32 am

There should be no conflict between multiple imputation and categorical variables. To look into this I would need two imputed data sets, the two outputs that show that these data sets worked individually, and the output that shows the problem you encountered with multiple imputation.

Anonymous posted on Tuesday, March 15, 2005 - 7:20 am

Hello,

I am trying to do a multilevel CFA with imputed data. I imputed the data by NORM and created an ASCII file containing the names of the 5 data sets as you described in the Mplus User�s guide. I specified a model with 3 factors both on the within and on the between level.

Now, I´ve got 2 questions:
1. In the output is mentioned that the number of replications �requested� is 5, whereas the number of replications �completed� is 1 or 3 (depending on the specific model). What does this mean? And why is the number of repclications completed not also 5?
2. The program tells me that the output option �standardized� is not available for Montecarlo. Is that right? How can I get standardized parameter estimates (factor loadings) with imputed data?

I am looking forward to your reply. Thank you very much in advance!

Linda K. Muthen posted on Tuesday, March 15, 2005 - 8:49 am

For some reason, the model did not converge for all five of your data sets. You could run each data set separately to see if that gives you more information about why there were convergence problems. Standardized estimates are not available with Monte Carlo which is what our imputation uses. You would need to compute the standardized estimates by hand.

Patrick Malone posted on Thursday, April 14, 2005 - 1:22 pm

Good afternoon.

I'm working with imputed data, on a project that I started in the v2 days, when I combined estimates in an external program. Rubin's rules, which I assume Mplus is using to get the SEs of parameter estimates, also usually give a degrees of freedom by which to evaluate the Est/SE on the t distribution. Is there a way to get that information from Mplus?

Thanks,
Pat

BMuthen posted on Friday, April 15, 2005 - 1:53 am

We don't provide that currently, but I would think that the t distribution is well enough approximated by a normal distribution in most cases.

Patrick Malone posted on Friday, April 15, 2005 - 4:42 am

Thanks, Bengt. Just as an addendum, I've also heard secondhand that Paul Allison recommends using the df as an index of the adequacy of the number of imputations, so this would be quite useful information in a future release.

Thanks,
Pat

Linda K. Muthen posted on Saturday, April 16, 2005 - 4:32 am

If you send me an email suggesting this to support@statmodel.com, I will add it to our list of future additions when I return.

Andrew Percy posted on Tuesday, April 26, 2005 - 3:30 pm

Hi

I am trying to run a basic 5 class LCA model (four nominal indicators each with 3 categories) with imputed data sets. When the model is run on a single data set the output is fine. But when the model is run on imputed data I get a output warning (copied below) and no model results. Do I need to use different output comands for imputed data?

Thanks for your help.

OUTPUT:

TECH1 TECH8;

*** WARNING in Output command
TECH1 option is the default for MONTECARLO.
*** WARNING in Output command
SAMPSTAT option is not available when outcomes are censored, unordered
categorical (nominal), or count variables. Request for SAMPSTAT is
ignored.
2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

Linda K. Muthen posted on Tuesday, April 26, 2005 - 3:47 pm

Analyzing imputed data sets uses external Monte Carlo so that is why you get the warning about TECH1. You should get the other warning even if you are running only one data set. It sounds like you need to send input/output, data, and your license number to support@statmodel.com.

Peter Martin posted on Friday, October 14, 2005 - 1:57 am

Hello there,

Is it possible to set up a multi-group path analysis with imputed data in Mplus?

I have two groups. The data are contained in 5 imputed data sets for each group - so there are 10 datasets altogether.

I have tried the "individual data, different data sets" method of specifying the groups (as described in Ch. 13 of the User Guide), listing two files that each contain the names of five imputed data sets. That didn't work, though - Mplus returned the error message: "There are fewer NOBSERVATION entries than groups in the analysis." (My sample sizes are above 5000 in each of the imputed data sets.)

Should I combine the two groups so that there are only 5 datasets in total (each containing data from both groups - this would be the "individual data, one data set" method) - or is there another way?

As always, I'm grateful for this brilliant discussion site,

Peter

Peter Martin posted on Friday, October 14, 2005 - 2:37 am

... actually, I've another question about multiple imputation and path analysis (not necessarily multigroup this time).

Can you get R-squares for the dependent variables, like you would for a path analysis without imputation?

Linda K. Muthen posted on Friday, October 14, 2005 - 9:49 am

For multiple group analysis with imputation, the data for both groups needs to be in one data set with a grouping variable included.

We don't give r-square with multiple imputation because the output is based on our Monte Carlo output. You would need to compute this by hand.

fati posted on Friday, October 14, 2005 - 11:47 am

I am doing an LCA with missing data, but I am not sure to understand well.

1- then if a have a missing data pattern, I can create a file with type=missing basic and define a pattern variable and use the result file in the second step the analysis, that's correct? but have other missing value i can define it in the second analysis, type=mixture missing and make missing=all (999) , that ok?

2- what is the MCAR test, and how can i obtain it with mixture modeling?

3- I have a question about the preceding message posted by Anonymous on Monday, April 26, 2004 - 9:54 pm , he suggest that i can do the path analysis without mixture, with type=missing and integration=montecarlo and also doing a model with type=mixture missing, and I must have the same result, wich result must be the same? and what is the reason for doing this?

4- another question is what is the maximum percentage of missing data that is acceptable for doing an multiple imputation?

thank you very much for your response,

bmuthen posted on Saturday, October 15, 2005 - 6:03 am

1. If you have missing data you should do two things in one and the same run. First, you should define what your missing data symbol is by using MISSING = all (999), say, in the VARIABLE command. Second, you should use TYPE = MISSING in the ANALYSIS command which gives you the so called MAR approach of ML estimation.

2. MCAR testing is testing that the data are missing completely at random. MCAR is not a necessary condition given that you can use the less restrictive MAR assumption, so you should have very specific reasons for wanting to know about MCAR. I would want a strong majority of the information to come from the data, not the model.

3. I don't know what your model is (the old post doesn't tell me that). Perhaps you refer to the fact that TYPE = MIXTURE MISSING with a single class gives the same results as TYPE = MISSING.

4. That's difficult to say and is also related to how non-random the missingness is. You should read Joe Schafer's book (see the Mplus web site) where he describes ways to quantify effects of degrees of missingness in terms of the uncertainty it brings to the estimation. The more missingness you have, the more your results rely on your model instead of your data, which is not good.

fati posted on Thursday, October 20, 2005 - 6:59 am

1. i know that i must do this for missing data, but my question is if i have missind data pattern (see the last message)?
2. ok
3.my model is LCA with 25 categorical variables, the last message that i have see is:

Anonymous posted on Sunday, April 25, 2004 - 6:36 pm

but, consider also modeling your missing data variable, if your variable is categorical you can model it with a mixture model, you can also do this with ordered/unordered categorical without mixture

the message that contains the question is :

Shige posted on Saturday, April 24, 2004 - 3:23 pm

Dear All,

I am trying to do a SEM survival model where some of the covariates in the measurement model have some heavily missing data (20%). Multiple imputation seems to be the best choice in this case.

Based on my reading of the Mplus 3 user guide, Mplus does not have the facility to carry out multiple imputation, but it can process imputed data (example 12.13). In that case, can anybody share their experience about which multiple imputation software to use to work with Mplus? I know there is large body of literature of multipe imputation, I am a little lost...

thank you

bmuthen posted on Thursday, October 20, 2005 - 6:56 pm

That earlier post was referring to missing data in covariates, not in the outcomes. You don't need multiple imputations for outcomes, but missingness on outcomes is taken care of by ML under MAR, that is Type = Missing.

Anonymous posted on Wednesday, November 02, 2005 - 5:59 pm

Hi.

I would like to know how the summary of the chi-square statistcs are calculated in "TYPE IS IMPUTATION". Is the "mean" simply the unweighted "mean" of the chi-square statistics?

Thanks!

Linda K. Muthen posted on Wednesday, November 02, 2005 - 6:35 pm

We don't give a chi-square for multiple imputation because we are not clear on the theory for this.

Anonymous posted on Wednesday, November 02, 2005 - 7:39 pm

So, I shouldn't interpret the "mean" of the chi-square, right? Below is the summary of running the multiple imputation with 10 impulations.

TESTS OF MODEL FIT

Number of Free Parameters 9

Chi-Square Test of Model Fit

Degrees of freedom 18

Mean 52.313
Std Dev 5.759
Number of successful computations 10

Proportions Percentiles
Expected Observed Expected Observed
0.990 1.000 7.015 42.415
0.980 1.000 7.906 42.415
0.950 1.000 9.390 42.415
0.900 1.000 10.865 42.415
0.800 1.000 12.857 42.415
0.700 1.000 14.440 49.892
0.500 1.000 17.338 51.864
0.300 1.000 20.601 54.372
0.200 1.000 22.760 54.558
0.100 1.000 25.989 56.637
0.050 1.000 28.869 56.637
0.020 1.000 32.346 56.637
0.010 1.000 34.805 56.637

Linda K. Muthen posted on Thursday, November 03, 2005 - 6:01 am

Are you saying TYPE=IMPUTATION; or TYPE=MONTECARLO; in your DATA command?

Linda K. Muthen posted on Thursday, November 03, 2005 - 4:01 pm

I finally found a multiple imputation example and you are correct that we do print the mean of chi-square. It is simply an unweighted mean. It has not been adjusted in any way because we are not clear on the theory for this.

S. Oesterle posted on Friday, January 20, 2006 - 3:20 pm

I am using TYPE=IMPUTATION to analyze 20 data sets that I have created via multiple imputation in NORM. I am estimating a path model with observed variables only. The output in Mplus does not give me standardized estimates. I know that you said in an earlier response that "Standardized estimates are not available with Monte Carlo which is what our imputation uses. You would need to compute the standardized estimates by hand."

How do I calculate the standardized estimates, particularly when my dependent variables are binary? I was going to use the following formula: take the ratio of the standard deviation of x to that of y and multiply it by the unstandardized esimate. However, the sample statistics printed in the output do not include variances for categorical variables and, besides, are only printed for the first data set. Is there any way to get the sample statistics averaged across all data sets, just like the fit statistics and the regression estimates? Or is there any other way to calculate the standardized estimate?

bmuthen posted on Friday, January 20, 2006 - 5:53 pm

For a binary dependent variable u, Mplus uses a standardization of the slope that divides by the standard deviation of u*, a latent response variable underlying u (drawing on the u* variance is also used in the R-square for binary outcomes of McKelvey & Zavoina, 1975 which is referred to in some text books). In a regular probit regression of u on x, the variance of u* given x, i.e. the residual variance, is fixed at 1. With logit regression, the residual variance is fixed at sqrt(pi**2/3). So the standard deviation is the sqrt of the sum of the variance in u* explained by x plus this residual variance.

S. Oesterle posted on Monday, January 23, 2006 - 4:04 pm

I am estimating a multiple group analysis (2 groups) for a path model using TYPE=IMPUTATION with 20 imputed datasets. In the model where all parameters are estimated freely across the 2 groups, I do not get any error messages and the estimation terminated normally. However, when I look at the coefficients for the second group, most (but not all) estimates and standard errors are zero. The estimates for the first group look ok. When I estimate the model separately for the 2 groups, I get correct results. What could be going on here?

Linda K. Muthen posted on Monday, January 23, 2006 - 4:39 pm

If you are using a version earlier than Version 3.12, you should download the most recent version of Mplus. If you still have the problem, send the input, two data sets, output, and your license number to support@statmodel.com.

S. Oesterle posted on Monday, January 23, 2006 - 5:06 pm

Installing version 3.13 did not solve the problem. I will send you my files. Thanks!

Scott Grey posted on Wednesday, February 01, 2006 - 3:11 pm

Hello!
I have been attempting to conduct a multilevel growth curve analysis �TYPE IS TWOLEVEL� with missing data using the multiple imputation feature as there are a number of covariates with missing data in our dataset. Mplus appears to replicate the analysis in the DOS window, but when the DOS window closes there is no output in the GUI window. An output file is generated, but it always ends at �Input data file(s)� The program has no problem imputating other analysis like �TYPE IS COMPLEX.� Here is the code:

DATA:
FILE IS "C:\Documents and Settings\insthealthsa4\
My Documents\DARE\Imputation\AUGUSTINE_m\AUG_imp.txt";
TYPE IS IMP;

VARIABLE:
NAMES ARE crsswlk hs9dist region1 region2 region3
region4 Urban stressms rhsfree MSfrelun MSredlun
MSwhite MSBlack MSLatino MSAsian MSother treatms
att7rev att9rev utq41a utq41b utq41c utq41d n2q51a
n2q51b n2q51c n2q51d upq40c n1q44c_7 n1q44c_8 AGE
sex family catuse1 catuse2 catuse3 catuse4 catuse5
catuse6 upq37a upq37b upq37c upq37d upq37e upq37f
upq37g upq37h n2q46a n2q46b n2q46c n2q46d n2q46e
n1q7g n1q7j n2q7g n2q7j eq10g eq10j tq10c tq10f
q10c q10f smoket1 drinkt1 pot1 smoket2 drinkt2
pot2 smoket3 drinkt3 pot3 smoket4 drinkt4 pot4
smoket5 drinkt5 pot5 latino black asian am_ind
oth_race clu1 clu2 clu3 clu4 clu5 clu6 clu7 clu8
clu9 clu10 clu11 clu12 clu13 clu14 clu15 clu16
clu17 clu18 clu19;

USEVARIABLES ARE urban rhsfree n2q51a n2q51b n2q51c
n2q51d upq40c n1q44c_7 n1q44c_8 age sex
latino black oth_race
anysmoke anydrink anypot lowhi hilow hihi
mattpol7 mattpol8 mattpol9;

WITHIN IS upq40c n1q44c_7 n1q44c_8 age sex
anysmoke anydrink anypot lowhi hilow hihi
latino black oth_race mattpol7 mattpol8 mattpol9;

BETWEEN IS rhsfree urban;

CLUSTER IS hs9dist;

DEFINE:
mattpol7 = (tq10c+tq10f)/2;
mattpol8 = (eq10g+eq10j)/2;
mattpol9 = (n2q7g+n2q7j)/2;
devian7 = (upq37a+upq37b+upq37d+upq37g)/4;
devian9 = (n2q46e+n2q46a+n2q46b+n2q46c)/4;

lowhi = 0;
hilow = 0;
hihi = 0;
anysmoke = 0;
anydrink = 0;
anypot = 0;
IF (devian7 LE 1 AND devian9 GT 1) THEN lowhi = 1;
IF (devian7 GT 1 AND devian9 LE 1) THEN hilow = 1;
IF (devian7 GT 1 AND devian9 GT 1) THEN hihi = 1;
IF (smoket1 GT 0 OR smoket2 GT 0 OR smoket3 GT 0 OR
smoket4 GT 0 OR smoket5 GT 0) THEN anysmoke = 1;
IF (drinkt1 GT 0 OR drinkt2 GT 0 OR drinkt3 GT 0 OR
drinkt4 GT 0 OR drinkt5 GT 0) THEN anydrink = 1;
IF (pot1 GT 0 OR pot2 GT 0 OR pot3 GT 0 OR
pot4 GT 0 OR pot5 GT 0) THEN anypot = 1;

ANALYSIS:
TYPE IS TWOLEVEL;
ESTIMATOR = ML;
ITERATIONS = 1000;
CONVERGENCE = 0.00005;

MODEL:
%WITHIN%
attinstb BY n2q51a n2q51b n2q51c n2q51d;
i s | mattpol7@0 mattpol8@1 mattpol9@2;
mattpol7 ON upq40c;
mattpol8 ON n1q44c_7;
mattpol9 ON n1q44c_8;
attinstb ON i s age sex lowhi hilow hihi
anysmoke anydrink anypot latino
black oth_race;

%BETWEEN%
attinstw BY n2q51a n2q51b n2q51c n2q51d;
n2q51a-n2q51d@0;
attinstw ON rhsfree urban;

OUTPUT: TECH1 TECH8;

THANKS FOR YOUR HELP!!

Linda K. Muthen posted on Wednesday, February 01, 2006 - 3:41 pm

Please send your input, data, and license number to support@statmodel.com. Looking at the input alone cannot tell me what happened.

Peter Martin posted on Thursday, May 18, 2006 - 8:25 am

Hello,

I am estimating a latent class model with type=imputation. There are 5 latent class indicators (Y), 4 of which are categorical, while 1 is nominal. I have also got covariates (X) that relate to the latent variable (C), and some direct effects from covariates to some of the Ys.

The estimation runs fine; but the output reports values of .000 for all estimates associated with the nominal Y - that is, both for the means of the nominal Y associated with each latent class, and for the direct effect of one of the X-variables on the nominal Y. In contrast, estimates associated with categorical Y are given and make sense.

The same problem does not occur when I run the model on just one data set (that is, without "type=imputation"). Neither does the problem occur when I specify my nominal Y as categorical, and use "type=imputation".

What could it be that goes wrong when estimating with imputed datasets and a nominal latent class indicator?

Linda K. Muthen posted on Thursday, May 18, 2006 - 9:47 am

This sounds like a problem that has been fixed in Version 4.1. If you download Version 4.1, you should be fine. If not, send your input, data, output, and license number to support@statmodel.com.

Peter Martin posted on Tuesday, May 23, 2006 - 1:54 am

Yes, using version 4.1 resolved the problem. Thank you, Linda!

Thomas Rodebaugh posted on Thursday, June 15, 2006 - 9:34 am

when using multiple imputation and regressing latent variables upon other latent variables, is it sufficient to set the all the latent variables' variances to 1 to get standardized values for these regression coefficients? or is it necessary to hand calculate these, too?

i feel like this should be a simple question, but i have a very hazy grasp of how standardization works, exactly.

thanks in advance for any help.

Linda K. Muthen posted on Thursday, June 15, 2006 - 11:02 am

If you fix the metric of the factors by fixing the factor variances to one instead of a factor loading, you would receive estimates equivalent to the Std standardization of Mplus. The two standardizations used in Mplus are described in Chapter 11 of the Mplus User's Guide where the general output is described.

Thomas Rodebaugh posted on Thursday, June 15, 2006 - 3:03 pm

thanks for that reply. now i'm running into another issue.

i'm using multiple imputation and specifying the MLM estimator because of some evidence that the multivariate distribution is not normal. now i would like to make some model constraints and test these via chi-square difference tests.

of course, when using MLM one cannot simply subtract the chi-squares--one needs the scaling correction factors.

however. . . the output when using MI does not provide scaling correction factors (that i can find). i tried using the difftest option that works for WLSMV, and it informed me that this only works for WLSMV. so. . . is there something i am missing that would allow me to test model constrains using MI and the MLM estimator?

thanks,

tom

Thomas Rodebaugh posted on Friday, June 16, 2006 - 6:34 am

. . . i realized after i wrote this that i could, of course, calculate the difference tests for each of the 5 MI data sets separately. is that the only way to go about this?

Linda K. Muthen posted on Friday, June 16, 2006 - 9:45 am

With multiple imputation, we give the average of the fits statistics like chi-square. I don't think there is any theory on how to actually calcluate chi-square for multiple imputation. Because of this, I don't know how you would do difference testing in this situation.

chennel huang posted on Monday, June 19, 2006 - 1:00 am

hello, i'm a student in TW.

When I try to operate MI in NORM, 11 of 18 variables are ordinal scale vars. How do i observe and decide the method to transform these vars to fit the assumption of normality? Sorry for my question in wrong place.

chennel huang posted on Monday, June 19, 2006 - 2:00 am

Following the previous question, for the ordinal vars, should I choose the "logit transformation" with limited range to make the transfomed valus reasonable, and choose the "to the nearest observed value"?

Linda K. Muthen posted on Monday, June 19, 2006 - 8:06 am

I'm not familiar with NORM. I would not transform ordinal variables because the numbers assigned to the categories do not represent numerical values.

chennel huang posted on Wednesday, June 21, 2006 - 5:19 am

After using NORM to make 5 datasets, I use Mplus vesion 3.0 to read these datasets. However, the output is"***ERROR(Err#: 29)Error opening file: mi.dat".
My syntax is in the following.

data:
file is mi.dat;
type is imputation;
variable:
names are cl edu inc gen bmi year at1-at7 ba1-ba4;
usevariable are at1-at7;
analysis:
type is general basic;

These datasets are named under mi1, mi2 mi3, mi4, and mi5. They are saved in the same folder with the syntax.

Thanks for your help.

Linda K. Muthen posted on Wednesday, June 21, 2006 - 6:28 am

The error means the file cannot be found. Check if the extension dat was added twice to the data set. Otherwise, send the input, data sets, and your license number to support@statmodel.com.

Yung Chung posted on Thursday, June 29, 2006 - 2:30 am

Hi, this is with specific reference to Thomas Rodebaugh's post on Thursday, June 15, 2006 - 3:03 pm:

There is a SAS Macro that calculate the "weighted" average (each statistic overstates the strength of the evidence against the null hypothesis because
it ignores missing-data uncertainty) of chi squares obtained over imputated dataset. The syntax can be found on Paul Allison's website.

Hope this helps

Susan Scott posted on Friday, October 13, 2006 - 11:41 am

Hi,

I would like to know where I can find information on how TYPE=IMPUTATION analyses are run.

When I do analyses on multiply-imputed datasets in SAS, the model is run on each of the datasets and then the estimates are combined (using proc mianalyse). However, I am finding that when I have an SEM model that has converged with TYPE=IMPUTATION and I try to run the same model on the individual datasets, I often get the message that the model does not converge. I would like to understand why this is happening.

Thank you,
Susan Scott

Linda K. Muthen posted on Friday, October 13, 2006 - 2:42 pm

I cannot answer this question without seeing the input, data sets, output, and your license number at support@statmodel.com. Please send the output with TYPE=IMPUTATION; where all data sets converged and also an output where using an individual data set did not converge.

Susan Scott posted on Friday, October 20, 2006 - 2:08 pm

I have e-mailed everything. I'm not sure if it hasn't registered in my e-mail program because I sent it from here, or if it was not sent for some reason. If you do not received anything, please let me know.

Thank you,
Susan Scott

Linda K. Muthen posted on Friday, October 20, 2006 - 4:21 pm

I have not yet received anything.

Rick Sawatzky posted on Wednesday, October 25, 2006 - 2:30 pm

Hi Linda and Bengt,

I created five imputation datasets to be used for a CFA based on MI. The model converges fine for each dataset individually, but when I combine the datasets in the analysis using the TYPE = IMPUTATION command and a separate input I get the following message:
Number of replications
Requested 5
Completed 1
Then when I change the order of the dataset in the input file I obtain 4 successful replications. I am unclear about what the cause of this discrepancy might be. Do you have any suggestions? I pasted the syntax for a very much simplied model in which I find the same problem.

Thank you,

Rick Sawatzky.

SYNTAX
DATA: file is mi.txt;
TYPE = IMPUTATION;
ANALYSIS: ESTIMATOR = WLSMV;
VARIABLE: NAMES ARE y1-y7;
USEVARIABLES ARE y1-y7;
CATEGORICAL ARE y1-y7;
MODEL: f BY y1-y7;

Rick Sawatzky posted on Wednesday, October 25, 2006 - 2:38 pm

Just to clarify my previous posting, the CFA of the imputed data files runs fine when I do not specify categorical data (i.e., the problem only occurs when the variables are specified as categorical).

Linda K. Muthen posted on Wednesday, October 25, 2006 - 3:16 pm

Please send the input, data sets, output, and your license number to support@statmodel.com. We need this information to understand what is happening.

Rick Sawatzky posted on Thursday, October 26, 2006 - 4:22 pm

Linda,

Thanks so much. The above problem is now solved. However, when I use the WLSMV estimator I find that the output file does not provide a mean chi-square statistics. I assume that this might be because the estimated degrees of freedom for WLSMV might not be identical for the different MI dataset. Is this explanation correct or is there another reason why the mean chi-square is not provided (it is provided when I use WLSM estimation for the same model).
Thanks again,

Rick.

Linda K. Muthen posted on Thursday, October 26, 2006 - 4:32 pm

The mean chi-square would not be meaningful for WLSMV for the same reason the chi-square values cannot be used for difference testing -- only the p-value is meaningful.

Bruce A. Cooper posted on Tuesday, March 13, 2007 - 12:50 pm

Hi -
I want to use TYPE=IMPUTATION to do (say) a two-step hierarchical regression analysis and test the difference between the two models.

(1) Can I use the difference between the mean -2Loglikelihood statistics for the two models to test for the improvement in fit from the second set of variables added in the larger model?

(2) There is no test for model fit (all values are 0). I assume I can use the mean -2LL and df to test each model?

(3) How can I get the corrected df and p-values for the t-statistics reported by the program?

(4) How can I get the relative efficiency and fraction of missing information for the intercept and predictors?

Thanks,
bac

Linda K. Muthen posted on Saturday, March 17, 2007 - 9:21 am

1 and 2. No. The difference between two average loglikelihoods is not distributed chi-square.

3. The standard errors are correct so the ratio gives a correct z-score.

4. This is not provided. You would have to see the Schafer text to see how to do this.

Bruce A. Cooper posted on Sunday, March 18, 2007 - 9:26 am

Thanks, Linda -

I must have a basic misunderstanding about the Deviance Chi-squared, or wonder why it would not be relevant to 1&2 above. In other maximum likelihood-based models, -2LL is distributed as Chi-squared with the df = number of parameters in the model. So, it seems like the "mean -2LL" would also be distributed Chi-squared. Also, the difference between the -2LL for one model compared to the -2LL for a nested model is called the Deviance, and is distributed as Chi-squared with the df = the difference in the # of parameters estimated by the two models. This allows the test for the difference between hierarchical models with, for example, logistic regression and multilevel regression. I don't understand why the same would not be true in the case of hierarchical linear regression, estimated with maximum likelihood in this case.Could you help me with a reference so I could learn why the "mean Deviance" would not also be distributed Chi-squared?
Thanks,
bac

Linda K. Muthen posted on Monday, March 19, 2007 - 10:21 am

We have never seen an article discussing whether the loglikelihood averaged over imputed data sets is distributed as chi-square. It may be and whether it is may also be a function of the number of imputed data set. If you know of a reference that supports this, please let us know.

Bruce A. Cooper posted on Monday, March 19, 2007 - 5:59 pm

Thanks, Linda -

Here are some references that you may find useful.

I haven't gotten the Statistics in Medicine reference yet, but Don Rubin referred to it and two others in the "IMPUTE" thread "IMPUTE: Re: "Averaging" chi-square values (fwd)" as providing information about averaging Chi-squared values from SEM models on imputed data sets. There are some other notes in the IMPUTE threads re averaging R-squared values, but you already report R-squared for the imputation analysis. It would be nice to have the DF for the t-tests and the and an option for testing the Deviance, too!

Thread: http://www.mail-archive.com/impute@utdallas.edu/msg00158.html

References:
Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F-reference distribution. Journal of the American Statistical Association, 86(416), 1065-1073.

Rubin, D. B., & Meng, X. L. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79(1), 103-111.

Rubin, D. B., & Schenker, N. (1991). Multiple imputation in health-care databases: an overview and some applications. Stat Med, 10(4), 585-598. (PMID: 2057657)

Linda K. Muthen posted on Monday, March 19, 2007 - 6:47 pm

Thanks for the references. We'll take a look at them.

Bengt O. Muthen posted on Friday, March 23, 2007 - 2:07 pm

Looks like some of these references are useful in terms of model testing in future versions of Mplus.

Anonymous posted on Friday, May 04, 2007 - 12:16 pm

Good aftenoon, I have quick question: How does MPlus identify the datasets used in the imputation process? Is there a way to specify the names of the data sets or does the program search for some identifier? Thanks.

Linda K. Muthen posted on Friday, May 04, 2007 - 12:32 pm

There is no identifier. The data set names are listed in a file and the file is accessed. See Example 12.13.

Bruce A. Cooper posted on Sunday, May 06, 2007 - 4:26 pm

Hello -

I have found what might be a bug in your MI procedure, or at least a documentation problem. I have run a linear regression using 10 MI data sets produced in SAS. In one analysis, I used the names in the SAS file -- 7 of which have more than 8 characters. Mplus warns that the 7 names contain more than 8 characters and that only the 1st 8 will be used in the output, and gives the offending names. When I run the same inp file after correcting the names to be only 8 characters, I get different results. There are no other differences in the two analyses. The run with the short names corresponds closely to the SAS output. Here are the reduced outputs:

Run with 7 long varnames:
STAIX1 ON
CCMC 1.635 1.866
CCFC -3.035 1.408
CCMXCCF 0.340 1.505

Run with short names:
STAIX1 ON
CCMC -1.236 2.644
CCFC -2.632 2.836
CCMXCCF 0.077 1.439

Thanks,
bac

Linda K. Muthen posted on Sunday, May 06, 2007 - 6:06 pm

I would need to see the input, data sets, output, and license number at support@statmodel.com. If you are not using Version 4.21, I suggest you run the analyses with Version 4.21 as a first step.

Anonymous posted on Monday, May 07, 2007 - 5:17 am

In reference to response Linda K. Muthen posted on Friday, May 04, 2007 - 12:32 pm, I am seeking further clarification of how Mplus delineates the datasets. Thank you for your referral to the MPLUS User's guide. In example 12.13, the following is stated: "Each record of the file must contain one data set name. For
example, if five data sets are being analyzed, the contents of impute.dat
would be:
data1.dat
data2.dat
data3.dat
data4.dat
data5.dat
where data1.dat, data2.dat, data3.dat, data4.dat, and data5.dat are the
names of the five data sets created using multiple imputation."

After reviewing this example, my questions are:
Are the imputed datasets to be analyzed identified by their possession of the suffix .dat (or any other Ascii file format, .txt, .csv etc.)? Does the program actively search for a variable containing this data? If so, is it correct to say that to analyze imputed data, one must create a character variable that distinguishes each data set, when preparing the data for analysis using MPLUS? I'm sorry to be a bother, but I'd like to understand how Mplus makes this distinction. Thanks again for your aid.

Linda K. Muthen posted on Monday, May 07, 2007 - 7:11 am

The program looks for the file impute.dat. It then reads the data sets in the order found in the list of data set names. There is no required extension for the names of the data sets.

Anonymous posted on Monday, May 07, 2007 - 12:29 pm

Okay. I think I understand now. The file command tells the program to search for other files whose names are listed in the impute file. The analysis is the based on results combined across the data sets. I was assuming the data sets were all combined into one larger file.

Linda K. Muthen posted on Monday, May 07, 2007 - 5:16 pm

For Bruce Cooper:

The order of the variable names is different in the two inputs:

arsmamso arsmamao msosc maosc msxma ccmc ccfc ccmxccf

arsmamsos arsmamaos msosc maosc ccmc ccfc msxma ccmxccf

E. Christopher Lloyd posted on Tuesday, May 22, 2007 - 12:38 pm

I have several questions regarding the output when type=imputation is used.

1. If a replication results in warnings (such as a warning about a singular matrix), is that replication's results still included in the output?

2. I cannot seem to find what the expected and observered "proportions" and "percentiles" columns mean in the tests of model fit section. Can you refer me to a page in the User's Guide or briefly explain it? I need enough detail to be able to address its meaning if a Committee member should ask.

Thank you!

Chris Lloyd

Linda K. Muthen posted on Tuesday, May 22, 2007 - 1:10 pm

1. Only runs that did not converge are not included in the results.

2. These are explained in Chapter 11 under Monte Carlo Output.

Kantahyanee Murray posted on Wednesday, September 19, 2007 - 11:45 am

Hello,
I am using Mplus to perform SEM on a MI dataset for my dissertation. Any new information about how to obtain a chi-square representing the combined datasets?

In another post, you made reference to the p-value of a chi-square being valid. Does this mean that the average p-value is valid for determining statistical significance?

Thank you.
K.Murray

Linda K. Muthen posted on Thursday, September 20, 2007 - 11:15 am

I don't think the average p-value would be valid.

Kantahyanee Murray posted on Wednesday, September 26, 2007 - 4:20 pm

Thank you.

Alison Riddle posted on Friday, March 14, 2008 - 9:57 am

Hi Linda and Bengt,

I have 10 imputed data sets created in Amelia with which I would like to do a complex EFA (weighted and clustered) with categorical data and a linear regression using the results of the EFA in a CFA framework and then regressing them on to categorical outcomes and covariates. I am using Mplus v.5.

Can I use IMPUTATION with COMPLEX? Are there any other issues that I should be concerned about?

Thanks for your assistance.

Cheers,
Alison

Alison Riddle posted on Friday, March 14, 2008 - 10:00 am

Hello again,

I just checked the User's Guide and it appears that I cannot use IMPUTATION with EFA - is that correct?

Cheers,
Alison

Linda K. Muthen posted on Friday, March 14, 2008 - 10:22 am

You can use IMPUTATION with COMPLEX EFA with the PROMAX or VARIMAX rotations but not with the other rotations.

Allison Holmes Tarkow posted on Sunday, April 27, 2008 - 6:51 pm

I'm using multiple imputation and am doing a latent growth curve model. I would like to do a multiple group analysis.
In a posting above it was stated that it is possible to use multiple imputation for multiple group analyses.

To clarify, is this true when the multiple imputation is performed on the whole sample as opposed to separate imputation analyses for the groups of interests (e.g., boys and girls)?

This is probably simple, but I'm having trouble wrapping my head around how it would work to compare 2 groups (that you are testing to be non-equivalent) if the complete dataset is constructed based on the assumption that the missing data patterns are generated by one sample.

Thanks for your help!

Linda K. Muthen posted on Monday, April 28, 2008 - 9:04 am

I would think you would base the imputation on the full sample unless missing data patterns vary for males and females. I would suggest seeing what the imputation literature says however as I have no evidence to support this opinion.

Chun-Ju Chen posted on Sunday, June 01, 2008 - 6:46 am

Hi,
Can Mplus work the EFA when the TYPE=IMPUTATION? or...it can only work when the analysis like example12.13 ?
(our variables are ordinal.)

Sorry for disturbing again. Our institution wants to update our Mplus, before that, we have to make sure about the EFA can do the work when TYPE=IMPUTATION.
Thanks for your help.

Linda K. Muthen posted on Monday, June 02, 2008 - 9:46 am

TYPE=EFA and the IMPUTATION option cannot be used in combination.

Daniel Oberski posted on Monday, June 09, 2008 - 7:11 am

Dear Drs Muthen

I have 20 multiply imputed correlation matrices, but not the imputed cases from which they were computed.

Can I use TYPE = IMPUTATION to estimate a model in Mplus from these 20 correlation matrices? Or does this option only work with separate cases?

Thank you very much in advance for you help,

Daniel

Linda K. Muthen posted on Monday, June 09, 2008 - 8:10 am

TYPE=IMPUTATION requires raw data. It cannot be used with summary data.

Andrea Dalton posted on Tuesday, July 22, 2008 - 3:16 pm

Hi,

My question is related to the post " Anonymous posted on Tuesday, March 15, 2005 - 7:20 am" and reply.

I am running an analysis using imputed data sets, and the output indicates that the number of replications "completed" is only 2, when I had 5 data sets originally.

I ran each set separately and there were no errors (i.e., "model estimation terminated normally"). So, why is it that I don't get all 5 sets included in the analysis?

Linda K. Muthen posted on Tuesday, July 22, 2008 - 3:17 pm

Which version of Mplus are you using?

Andrea Dalton posted on Tuesday, July 22, 2008 - 3:22 pm

Not the newest update - I think it's version 4 (bought it almost a year ago).

Linda K. Muthen posted on Tuesday, July 22, 2008 - 4:46 pm

You should download Version 5.1.

Andrea Dalton posted on Wednesday, July 23, 2008 - 10:49 am

That worked! Thanks.

aprile benner posted on Thursday, August 07, 2008 - 6:43 am

Good morning -

I am conducting a path analysis with 10 imputed datasets. Is there a way to run Model Constraint with Type = Imputation? I have tried and am not getting and error (but am also only getting truncated output).

Thanks,

aprile

Linda K. Muthen posted on Thursday, August 07, 2008 - 7:00 am

MODEL CONSTRAINT is available with the IMPUTATION option. I was wrong about that. Please send your input, data, output, and license number to support@statmodel.com.

Donna Ansara posted on Wednesday, November 05, 2008 - 4:00 pm

Hello,

I am running latent class regression analysis using Type=imputation and am able to run this perfectly fine. I am interested in presenting confidence intervals for the regression coefficients for the covariates and Mplus does not seem to provide this output when I specify the cinterval option. Would it be appropriate to calculate them using the standard errors that are indicated for the regression coefficents in the usual manner (i.e., estimate +/- 1.96SE)? Thank you for your assistance.

Linda K. Muthen posted on Thursday, November 06, 2008 - 8:41 am

This would be correct.

bob calie posted on Tuesday, May 12, 2009 - 8:21 am

Hi All,

I'm trying to impute missing data for a binary variable (say, gender: girl/boy). Since the data were collected from multiple schools and there are apparently distinct proportions of gender across schools, it seems that a 'stratified' imputation is more appropriate. Any ideas?

Thanks very much in advance.

Linda K. Muthen posted on Tuesday, May 12, 2009 - 9:45 am

I don't know much about imputing data. I think you should pose this question to the developer of the software you would be using to impute the data.

bob calie posted on Wednesday, May 13, 2009 - 1:24 pm

But Mplus can deal with missing data. What I was asking is if it's possible to impute missing with different probability across different clusters. This is not a research question and I was asking the developer of Mplus. Maybe I posted it in a wrong place or I shouldn't have used Mplus. Thanks anyway.

Michael Spaeth posted on Thursday, May 14, 2009 - 12:25 am

If I were you I would ask Joseph Schafer or colleagues. He developed the freeware "Norm" which is a multiple imputation program. Mplus does not impute missing data. It handles missing data via a maximum likelihood approach (FIML).

bob calie posted on Thursday, May 14, 2009 - 8:58 am

Thanks, Mike!

Holmes Finch posted on Friday, June 05, 2009 - 5:23 am

Hi,

I am doing some simulations involving multiple imputation. I have imputed data for 100 replications using SAS and created 10 output datafiles for each replication. I would now like to use MPlus to conduct a LGCM for each of the 10 imputations for each of the 100 replications. I see how easy it is to read one set of 10 imputations in using the TYPE=IMPUTATION command, but I'm unsure of how to do this for my set of 100 replications, each with 10 imputations. Does this make sense? Thanks for any suggestions.

Holmes

Linda K. Muthen posted on Friday, June 05, 2009 - 9:19 am

There is no way in Mplus to combine the ten imputation outputs. You would need to write a program to extract what you need from the output and combine it.

Holmes Finch posted on Friday, June 05, 2009 - 9:28 am

Thanks, Linda, for the info.

Holmes

Dorothee Durpoix posted on Sunday, June 07, 2009 - 2:13 pm

Dear Linda,

I've analyzed 13 imputed datasets, of which 6 are completed, the output said. However, when I run them individually, only 3 actually converge. A few earlier posts asked the same question, but there was no indication of what may have been the problem(s).
Could you please enlighten me of what is happening?

Cheers.

Dorothee Durpoix posted on Sunday, June 07, 2009 - 2:17 pm

Just to complete my post: I'm using the 5.2 version.

Cheers.

Linda K. Muthen posted on Sunday, June 07, 2009 - 3:05 pm

If you add TECH9 to the output command, it should show the problem. If this does not help, please send the problem and your license number to support@statmodel.com.

Anonymouse posted on Tuesday, August 25, 2009 - 11:20 am

Hello,

I am testing a path model which is composed of a series of quantitative variables predicting two binary dependent variables. I am using multiple imputations to handle missing values on the x variables. When I try to use the MODEL INDIRECT command, I get a message indicating that Mplus cannot perform MODEL INDIRECT for multiple imputations.

Is there any way to work around this? It seems that I have to use multiple imputations for the missing x values because otherwise listwise deletion is used...

Linda K. Muthen posted on Tuesday, August 25, 2009 - 11:35 am

You can use MODEL CONSTRAINT to define the indirect effects.

Maren Winkler posted on Monday, September 21, 2009 - 2:36 am

Hi,

I have data from 1187 subjects on 135 variables. There is missing data on one variable (appr. 11 %) which is the only variable that is not categorical. I've done Multiple Imputation with NORM, getting 20 datafiles for further analyses.

I have used the WLSMV-estimator for my SEM. Mplus suggests to use NOCHISQUARE and NOSERROR to reduce computation time. I've done that but get the following messages: "NOCHISQUARE option is not available with multiple imputation.
Request for NOCHISQUARE is ignored.
NOSERROR option is not available with multiple imputation.
Request for NOSERROR is ignored."

Why then does Mplus suggest this option if I can't use it?

Moreover, I have a question concerning the output file when using multiple imputation:
For the tests of model fit, there are not only mean and SD for CFI, TLI etc., but also expected and observed proportions and percentiles - what does these results tell me?

Thanks for your help!

Linda K. Muthen posted on Tuesday, September 22, 2009 - 10:26 am

The option is recommended in general but can't be used with imputation.

See page 330 of the user's guide for a description of the expected and observed proportions and percentiles.

Maren Winkler posted on Monday, September 28, 2009 - 12:52 am

Thanks for your advice.

However, my output is a bit different from the example you give.

I'm running SEM with multiple imputation (5 computations). For the chi square test I only get the following output:

"TESTS OF MODEL FIT

Number of Free Parameters 297

Chi-Square Test of Model Fit

Number of successful computations 5

Proportions Percentiles
Expected Observed Expected Observed
..."

So I actually don't have Mean, Std Dev for chi-square. Is there a command needed to ask Mplus for chi square?

Moreover, for the other fit indices (CFI, TLI, RMSEA,...), the Std Dev is zero and hence, percentiles expected and observed are always the same. How do I read the proportions expected and observed then?

Thank you very much for your help!

Linda K. Muthen posted on Monday, September 28, 2009 - 6:58 am

I would need to see your full output to understand what you are seeing. Please send it and your license number to support@statmodel.com.

Moh Yin Chang posted on Wednesday, September 30, 2009 - 3:21 pm

Hi,

I read your example 12:13 but am still unsure how to set up the input data file. I tried to stack the imputed file in one data set and apparently it doesn't work. May I know how the dataset should look like for type=imputation?
Thanks

Linda K. Muthen posted on Wednesday, September 30, 2009 - 3:59 pm

Each imputed data set should be in a separate file. The file specified using the FILE option should contain the names of the datasets. Please reread the example and also see page 424 of the user's guide.

Moh Yin Chang posted on Thursday, October 01, 2009 - 9:21 am

Is there a way to perform model test with multiply-imputed data?

Linda K. Muthen posted on Thursday, October 01, 2009 - 4:42 pm

MODEL TEST can be used with the multiple imputation. Please send your problem and your license number to support if you are having a problem.

Charles B. Fleming posted on Wednesday, October 21, 2009 - 1:17 pm

I am running logit models with 40 imputations using the ML estimator and would like to see if model fit improves when I add a block of variables. In other words, I would like to assess the significance of the change in the loglikelihood relative to the change in number of estimated parameters. My question is whether the values for the loglikelihoods given when using multiple imputations can be used in a straightforward way (i.e., computing twice the difference in the loglikelihoods for the nested models) or do I need to apply some sort of correction as is described in your technical report: �Chi-square statistics with multiple imputation�. I am not clear if the output I am getting (using 5.21) already contains the correction to the mean loglikelihood value described in that report.

Tihomir Asparouhov posted on Wednesday, October 21, 2009 - 1:59 pm

The current Mplus version provides log-likelihood testing with imputation only for the SEM model with continuous variables (that would be the test of fit). As far as I know there is no simple way to construct likelihood correction factors that can be used easily to do general LRT tests for arbitrary nested models, i.e., even for the simple SEM with continuous variables you can only get the test of fit at this time. I would say LRT with imputation is still a tricky topic.

On the other hand Wald test is not - use model test to conduct a test for multiple parameters. In addition the SE (and the T value/test which is the same as the univariate Wald test) that are already in the output can be used to see if fit improves, i.e., to see if the predictors are significant.

Paola posted on Friday, February 12, 2010 - 6:34 am

I have 1000 replications, each replication contains 5 imputed datasets,
is it possible to do a random intercept model on all 1000 replic with both type= Montecarlo and type=Imputed?
If so, how?

Linda K. Muthen posted on Friday, February 12, 2010 - 9:33 am

I think you want to combine TYPE=IMPUTATION and TYPE=MONTECARLO. This cannot be done with Mplus.

Dylan K posted on Monday, February 22, 2010 - 7:32 am

Dear MPlus team,

I'm a complete novice with MPlus. I'm using it to hopefully produce a MIMIC LCA model. I performed Multiple Imputation using STATA as I had missing covariates. I'm okay with reading the imputed file into MPlus but where I'm getting stuck is in specifying an indicator for the imputed datasets within the single file. When I run the input file without this I get reams of output along the lines of:

*** ERROR in Data command
An error occurred while opening the file specified for the FILE option.
File: C:\Documents and Settings\user\Desktop\Dylan\SRA\OrigData\mi\***111415.

Hope you can help prevent me pulling my hair out any further!

Thanks and best wishes,

Dylan

Linda K. Muthen posted on Monday, February 22, 2010 - 7:38 am

The data sets for multiple imputation must be in separate files.

Dylan K posted on Monday, February 22, 2010 - 8:08 am

Hi,

Thanks for getting back to me so quickly. I'm still getting confused. I've split the imputed datasets into different files, each beginning with data. (e.g. data1.dat).

I've called the file that these are stored in data as well. My input file reads:
DATA:
File is
"C:\Documents and Settings\user\Desktop\Dylan\SRA\data.dat";
TYPE=IMPUTATION;

I'm getting the following messages
*** ERROR in Data command
The file specified for the FILE option cannot be found. Check that this
file exists: C:\Documents and Settings\user\Desktop\Dylan\SRA\data.dat

I get the same whether I mane the parent file just data or put the .dat extension on. I've also tried to put the data in csv but no luck.

Does it matter if my syntax file is stored in another file?
Can you see where I'm going wrong?

(Apologies if this is a v basic question!)

Thanks in advance,

Dylan

Linda K. Muthen posted on Monday, February 22, 2010 - 8:38 am

See Example 12.13 in the user's guide. If this does not help, please send your output file and license number to support@statmodel.com.

Kelly P posted on Tuesday, March 02, 2010 - 7:35 am

Hi,

I am currently considering using multiple imputation due to missing data problems I am encountering with my dataset. However, the data has a large number of sibling pairs, for which I use the cluster command, and the model I am running has indirect effects. Would these options be available if I was using type=imputation? From the above posts it looks like there would be ways to compute the indirect effects, but I am also concerned about the cluster option.

Thanks!

Linda K. Muthen posted on Tuesday, March 02, 2010 - 2:25 pm

The CLUSTER option is available for TYPE=IMPUTATION;

Kelly P posted on Wednesday, March 03, 2010 - 4:35 am

Thanks Linda! Are there examples available anywhere showing how to use the CONSTRAINT command to compute indirect effects?

Linda K. Muthen posted on Wednesday, March 03, 2010 - 5:41 am

No. An indirect effect is the product of the regression coefficients.

miriam gebauer posted on Tuesday, March 30, 2010 - 12:42 pm

hello,
reading the posts above lead me to the assumption that indirect effects and/or interaction modelling with latent variables shouldn't be done with multiple imputed data, because it violates the basic assumptions of multiple imputation (linear connection between all variables within the imputed datasets)? do I understand this right??
thanks for your help!
miriam

Linda K. Muthen posted on Wednesday, March 31, 2010 - 10:44 am

Indirect effects are linear so your concern would not apply to them. For interactions, you may want to include the interactions in the set of variables used for imputation.

Antti K�rn� posted on Monday, May 03, 2010 - 7:10 am

Hi,
is there a way to include a certain variable in the variable names list and still not to use it for imputation? Now in the User's guide it is stated: "Because the variable z is included in the NAMES list, it is also used to impute missing data for y1, y2, y3, y4, x1 and x2"(p.348). It is obvious that not all variables are useful for imputation, for example IDs.

Bengt O. Muthen posted on Monday, May 03, 2010 - 7:38 am

In UG ex 11.5 you don't have a USEV statement, which means that the USEV variables are the same as the NAMES variables.

If you add a USEV statement - that excludes say ID of the NAMES list of variables - the variables on the USEV list are the ones that will be used for imputation.

craig wiernik posted on Monday, June 21, 2010 - 10:10 am

Hi all,
I have a general modeling question. I've used Stata to prep my data for use in mplus. I've also needed to handle some missing data on my dependent variables. So, using ICE in stata, I created a bunch of datasets in which I've imputed values for a select number of variables, leaving everything else alone. What this means is that when Mplus sees the data, I have a "complete" set of data for my dependent variables, but may still have missing values on the independent, covariates, and controls.

I thought Mplus handled missing data that was not on the dependent variable, but I'm finding that analysis on my imputed datasets using the
"Type = IMPUTATION" command still loses many cases, largely due to the covariates and control variables.

Am I doing something wrong in Mplus? Or, do I want to create datasets in which I've imputed everything, and just send Mplus complete data?

thank you!
craig

Linda K. Muthen posted on Monday, June 21, 2010 - 11:22 am

Missing data theory is for dependent variables only. If you don't want observations with missing on the covariates to be excluded, you need to impute for the full data set. You can do this in Version 6 of Mplus using the DATA IMPUTATION command. See Example 11.5 in the Version 6 user's guide.

craig wiernik posted on Monday, June 21, 2010 - 11:57 am

Hi Prof. Muthen,
We only have v5, so I'll impute everything I want to in Stata, and then use the files in Mplus.
Thank you very much for the prompt reply! :-)

craig

John Mallett posted on Thursday, July 29, 2010 - 4:56 am

I am looking to use multiply imputed data sets to run a multiple regression model with a continuous outcome variable.

I have missingness on both my predictors and outcome variable, so I am wondering if it is necessary to omit the outcome variable from the imputation model when creating the MI data sets ?

Is there a reference you can recommend that deals with this?

Thanks

Jon Heron posted on Thursday, July 29, 2010 - 7:19 am

HI John,

I know a reference that says the opposite if that helps

Missing Data Analysis: Making It Work in the Real World
John W. Graham
Annual Review of Psychology, Vol. 60: 549-576

In the section on dispelling the myths

"The fear is that including the DV in the
imputation model might lead to bias in estimating
the important relationships (e.g., the regression
coefficient of a program variable predicting
the DV). However, the opposite actually happens.
When the DV is included in the model, all
relevant parameter estimates are unbiased, but
excluding the DV from the imputation model
for the IVs and covariates can be shown to produce
biased estimates. The problem with leaving
the DV out of the imputation model is this:
When any variable is omitted from the model,
imputation is carried out under the assumption
that the correlation is r = 0 between the omitted
variable and variables included in the imputation
model. Thus, when the DV is omitted,
the correlations between it and the IVs (and
covariates) included in the model are all suppressed
(i.e., biased) toward 0."

John Mallett posted on Thursday, July 29, 2010 - 7:59 am

Thank you Jon for your suggestions and reference. Much appreciated.

I was thinking about this question in the context of a planned missingness design (3-form - Graham, Hofer, and MacKinnon (1996), where the DV construct and the predictors (if measured by multiple items) are systematically reduced in different versions of the form and the missingness subsequently imputed using all available information from all 3 forms.

I hope this makes sense?
John

Jon Heron posted on Thursday, July 29, 2010 - 8:21 am

Hmm, never come across that before. Is the missing data by design treated as MCAR then? depending on how they divide their sample I suppose.

John Mallett posted on Thursday, July 29, 2010 - 10:06 am

"Hmm, never come across that before. Is the missing data by design treated as MCAR then? depending on how they divide their sample I suppose?"

Yes Jon

David Bard posted on Monday, August 09, 2010 - 10:57 pm

Can you clarify the variable output from a twolevel MI procedure in version 6.0? It looks like variables with the same original names represent the observed and imputed values, variables appended by asterisks are thresholds or within-level latent response values, and variables prefaced by 'B_' represent random posterior draws from the between level (for variables modeled at both levels), but I couldn't find this documented (if it is documented, could you direct me to that segment of the manual in case other questions arise).

Is it possible to output latent response variable values for between-level-only variables? I'm not seeing a B_ variable for any of my between-only variables.

Also, I tried to save my subject ID variable as an auxiliary variable. A column for it appears in each imputation file, but each value is stored as 10 asterisks. Is there a limit on the size of these auxiliary variables? My Ids are 7 digits.

Thanks.

Linda K. Muthen posted on Tuesday, August 10, 2010 - 12:18 pm

The IDVARIABLE option of the VARIABLE command should be used to identify the id variable not the AUXILIARY option. Please send the full output as an attachment and your license number to support@statmodel.com so I can see what is being saved.

Sharon Ghazarian posted on Monday, August 23, 2010 - 11:03 am

Is there a limit to the number of variables that can be imputed at one time?

Linda K. Muthen posted on Tuesday, August 24, 2010 - 8:17 am

There is no limit, but with a large number of variables the number of parameters in the imputation model may be large. Use only the analysis variables and missing data correlates to impute data. Don't use all variables in a data set for example.

Sharon Ghazarian posted on Tuesday, August 31, 2010 - 9:56 am

Thanks Linda.

Another question - is there a way to put in variable specific minimum and maximum constraints for multiple imputation? For example, often times multiple imputation results in extreme values on some variables and so constraints are necessary to tell the program that imputed values should only fall between 1 and 4 (as an example). Is there any place to do this in MPlus right now?

Bengt O. Muthen posted on Tuesday, August 31, 2010 - 4:32 pm

There is currently not a way to do this for continuous outcomes.

Bengt O. Muthen posted on Tuesday, August 31, 2010 - 5:00 pm

I was just reminded that you do have the option

VALUES =

If the number of values that are present in the data is a relatively small number (1, 2, 3, 4) you just list those.

Otherwise you can use

1.0 1.1 1.2 .... 3.9 4.0

to get a rounding of the imputed value to the first decimal.

Sharon Ghazarian posted on Wednesday, September 01, 2010 - 8:43 am

Great - thank you!

Maria Clara Barata posted on Tuesday, September 07, 2010 - 8:33 am

Hi!
I am trying to use the new multiple imputation software in MPLUS but all I am getting are fatal error messages.

It seems to be reading the data in correctly, so I was wondering if there is a way to get a more detailed error message so that I can troubleshoot. I am using output: TECH8 as per example 11.5.

*** FATAL ERROR
PROBLEMS OCCURRED DURING THE DATA IMPUTATION.

Thanks

Bengt O. Muthen posted on Tuesday, September 07, 2010 - 11:59 am

Please send your input, output, data and license number to support@statmodel.com.

Maria Clara Barata posted on Tuesday, September 07, 2010 - 1:57 pm

Thanks. Just did.
Clara

Tom Booth posted on Monday, October 11, 2010 - 5:16 am

Hello,

I am trying to run multiple imputations on a set of mixed categorical and continuous variables (n=972). I am using the default H1 imputation (sequential regression). From reading Asparouhov & Muthen (2010, 15th July) this seemed most appropriate.

I am getting a warning that reads;

*** FATAL ERROR
THERE IS NOT ENOUGH SPACE TO RUN MPLUS ON THE CURRENT INPUT FILE....

I have no other programs running, and have installed the 32-bit version for which the machine I am running it on has plenty of capacity.

I am unsure what about the analysis I am running is causing this issue.

Regards,

Tom

Linda K. Muthen posted on Monday, October 11, 2010 - 6:54 am

Please send your output file and license number to support@statmodel.com.

Nicholas Bishop posted on Monday, October 11, 2010 - 12:38 pm

Hello,
I am receiving the same error message as Clara described above: *** FATAL ERROR
PROBLEMS OCCURRED DURING THE DATA IMPUTATION. I am receiving the imputed data sets correctly but the imputed list file contains zero data. What is needed to correctly produce the list for the imputed data sets? Thanks for your help.

Nicholas

Linda K. Muthen posted on Monday, October 11, 2010 - 2:13 pm

The list file should be generated automatically. Please send your output file and license number to support@statmodel.com.

Dana Wood posted on Thursday, October 21, 2010 - 12:51 pm

Hello,

I am trying to use the MODEL TEST command with multiply imputed datasets. The model runs fine with the multiply imputed datasets, but when I add in the request for MODEL TEST, I don't get any output. The black MS-DOS screen appears for a brief flash and then nothing happens.

Linda K. Muthen posted on Thursday, October 21, 2010 - 2:27 pm

If you are not using Version 6.1, try that. If you are, please send the input, data, and your license number to support@statmodel.com.

Alexandre Morin posted on Friday, October 22, 2010 - 1:29 am

Hi,
I have a couple of questions related to examples 11.5 and 11.6 of the manual.
(1) Example 11.5:
You mention on page 348 that all variables part of the NAMES list are used to impute data (on the variables listed under IMPUTED). Is there any way NOT to use some variables that are part of the list in the imputation ?
(2) Example 11.6: Will this be the same in Plausible values imputations ? Will all the variables listed in the NAMES list be used in generating the plausible values or only those included in the MODEL section ? If yes, is there a way again to not use some variables ?
(3) It it possible to generate multiple imputation data sets (5-10-20-etc.) including imputed values for missing data on observed variables and plausible values in the same data sets.
(4) How can we include additional variables in the saved multiple imputation and/or plausible values data set (lets say the z variables from example 11.5)? I do not necessarily want to impute these data or to use them in the imputation algorythm, just to have them saved in the created data sets so as to be able to use them in subsequent analysis. Will the simple AUXILLIARY function (without e-m-r) work ?

Thank you !

Bengt O. Muthen posted on Friday, October 22, 2010 - 10:14 am

I am glad you asked so that this can be clarified.

(1) and (4):

UG ex 11.5 is not as clear as it could be on this point. A user would typically work with not only the NAMES and IMPUTE lists of variables, but also a USEVARIABLES list and an AUXILIARY list. The NAMES list simply reads the variables in the original data set. The USEVARIABLE list is a smaller subset of variables from the NAMES list, just as in an ordinary analysis. The NAMES list variables are the variables used to create the imputations. In UG ex11.5, the USEVARIABLE list is absent and therefore defaults to the NAMES list. Typically you also want to save into the imputed data set other variables that are not to be used in the imputation and to do that you put those variables on the AUXILIARY list.

(2) Same thing.

(3) The SAVE = data set contains what you are asking for. The PLAUSIBLE = data set gives summary statistics for plausible values.

Bengt O. Muthen posted on Friday, October 22, 2010 - 10:32 am

Correction - I should have said:

UG ex 11.5 is not as clear as it could be on this point. A user would
typically work with not only the NAMES and IMPUTE lists of variables,
but also a USEVARIABLES list and an AUXILIARY list. The NAMES list
simply reads the variables in the original data set. The USEVARIABLES
list is a smaller subset of variables from the NAMES list, just as in
an ordinary analysis. The USEVARIABLE list variables are the variables used
to create the imputations. In UG ex11.5, the USEVARIABLES list is
absent and therefore defaults to the NAMES list. Typically you also
want to save into the imputed data set other variables that are not to
be used in the imputation and to do that you put those variables on
the AUXILIARY list.

Alexandre Morin posted on Friday, October 22, 2010 - 11:11 am

Thank you very much!
It is indeed clearer.

Alexandre Morin posted on Saturday, October 23, 2010 - 10:43 am

Hi again,
Does the AUXILIARY (m) function works in the generation of plausible values (i.e. plausible values are generated from a model, but can we let variables NOT in the model influence the generation of plausible values?).
The question is based on the result you report in the "plausible value" paper that, when plausible values are to be used in a secondary analyses, all of the variables to be used in this secondary analysis need to be part of the PVs generation...
I am generating PVs from a complex ESEM-Within-CFa model. I will use them in a secondary analysis with an additional variable. Yet, when I add this variable to the ESEM-within-CFa model and allow it to correlate with the factors, the model crashes. Unless I just need to geet the variable in the Model by estimating its variance without allowing it to correlate with the factors?

Bengt O. Muthen posted on Saturday, October 23, 2010 - 5:42 pm

Aux(m) is only intended for ML estimation, not the Bayesian estimation used with plausible values.

Nicholas Bishop posted on Monday, November 15, 2010 - 11:55 am

Hello,
I am currently using Mplus version 6.1 to perform multiple imputation. I am receiving the following warning message when I run my imputation model:

*** FATAL ERROR
PROBLEMS OCCURRED DURING THE DATA IMPUTATION.

THE PSI MATRIX IS NOT POSITIVE DEFINITE.

THE PROBLEM OCCURRED IN CHAIN 2.

All variables included in the impute list contain missing data, and I am using the PROCESSORS = 2 command to reduce computing time. I have also specified categorical and continuous variables. Can you suggest changes I can make to get the imputation working? Thanks for your help.

Nick

Bengt O. Muthen posted on Monday, November 15, 2010 - 3:16 pm

There are two things you want to consider. One is the clarification of the UG imputation ex 11.5:

UG ex 11.5 is not as clear as it could be on this point. A user would
typically work with not only the NAMES and IMPUTE lists of variables,
but also a USEVARIABLES list and an AUXILIARY list. The NAMES list
simply reads the variables in the original data set. The USEVARIABLES
list is a smaller subset of variables from the NAMES list, just as in
an ordinary analysis. The USEVARIABLES list variables are the variables used
to create the imputations. In UG ex11.5, the USEVARIABLES list is
absent and therefore defaults to the NAMES list. Typically you also
want to save into the imputed data set other variables that are not to
be used in the imputation and to do that you put those variables on
the AUXILIARY list.

The other is the list of 14 suggestions in Section 4 of the Asparouhov-Muthen (2010) imputation paper on our website.

Sofie Henschel posted on Wednesday, February 02, 2011 - 3:41 am

Dear Prof. Muth�n,
I`m trying to run a SEM with an imputed data set. Mplus reads in all 5 data files, but is using none, hence the number of free parameters is zero and the test of model fit is not executed. TECH9 indicates that there is no convergence for each replication. However, if I run the 5 data sets separately, imputation 1,4, and 5 show a good fit while imputation 2 and 3 do not converge.
I am wondering why imputation 1,4, and 5 are not used with type=imputation although these models show good fit. Furthermore, is there anything I might investigate to see why imputation 2 and 3 do not converge? I am especially surprised by that since I have 11 variables in my data sets but only 3 were imputed, so the other 8 variables are the same for each of the imputation data sets.

Thank you very much for your support!

Linda K. Muthen posted on Wednesday, February 02, 2011 - 6:46 am

If you are not using Version 6.1, you should do so. If you have this problem with Version 6.1, please send your input, data sets, output, and your license number to support@statmodel.com.

Alain Girard posted on Wednesday, March 02, 2011 - 7:06 am

Hi
I have multiple imputed data sets and i want to perform a likelihood ratio test to compared 2 nested models. I read the technical appendix : "Chi-Square with Multiple Imputation".

Is exist a way in Mplus to compute this test ?

Thanks
Alain Girard
University of Montreal

Linda K. Muthen posted on Wednesday, March 02, 2011 - 9:14 am

This can be done with the ML estimator.

K�tlin Peets posted on Monday, March 14, 2011 - 3:08 pm

Hi,

Is it possible that the order of the variables that are used to impute missing data has an effect on imputation?

Tihomir Asparouhov posted on Monday, March 14, 2011 - 3:58 pm

Yes. Multiple imputation uses random number generation as a part of the MCMC estimation of the imputation model. When the variables are reordered different random bits will be used for different variables. This however should have minimal impact on any proper use of the imputed data sets.

K�tlin Peets posted on Wednesday, March 16, 2011 - 7:22 am

I have another problem. When I try to impute data, I get the following error message:

FATAL ERROR
THE NUMBER OF CLUSTERS PLUS THE PRIOR DEGREES OF FREEDOM OF PSI
MUST BE GREATER THAN THE NUMBER OF LATENT VARIABLES.
USING MORE INFORMATIVE PRIOR FOR PSI CAN RESOLVE THIS PROBLEM

Usually, I have solved it by decreasing the number of variables used for imputation. What should I do?

Tihomir Asparouhov posted on Wednesday, March 16, 2011 - 9:49 am

This happens because the number of variables in the imputation is more than the number of clusters in data. You can either remove some of the variables in the imputation model or you can perform an H0 imputation. With an H0 imputation you can use a factor analysis model on the second level for imputation purposes or you can use an unrestricted model with a different prior for the variance covariance matrix.

Take a look at section 3.3 in http://statmodel.com/download/Imputations7.pdf

Tihomir Asparouhov posted on Wednesday, March 16, 2011 - 9:52 am

Also I think you are using Mplus version 6. If you use version 6.1 you will not have that problem.

K�tlin Peets posted on Wednesday, March 16, 2011 - 5:54 pm

Thank you. I now downloaded Mplus 6.1, and the problem was solved.

I still have another question. I center my data before imputation (as I also form interaction terms before imputation). However, I am not sure which mean value to use when I center my variables that have missing values. Should I subtract the mean values that are computed on the basis of the cases/clusters that have no missing values?

My second question concerns variances of the parameter estimates. I understand that these are squared standard errors. But sometimes variance estimates (in the Tech 3 output) are not always equal to the squared standard errors in my model output (e.g., SE = .104, variance estimate = .005). Is that due to rounding error or smth else? What should I do in this case? I need these variance estimates for calculating simple slopes.

Linda K. Muthen posted on Thursday, March 17, 2011 - 8:49 am

You should center after imputation.

Those values sound quite different even for rounding. Please send the output and your license number to support@statmodel.com so we can take a look at it.

K�tlin Peets posted on Thursday, March 17, 2011 - 9:58 am

I have understood that it is advised to include all the necessary interaction terms in the imputation phase.

If I were to center after imputation, how can I create interaction terms between observed variables? It seems that I cannot use "define" command.

Linda K. Muthen posted on Thursday, March 17, 2011 - 10:59 am

You can saved the imputed files. DEFINE works with TYPE=IMPUTATION.

David Bard posted on Monday, March 21, 2011 - 11:41 pm

I'm having a hard time grasping how the default H1 sequential model is parameterized and estimated when there is a mixture of categorical and continuous imputation variables. The output seems to suggest that both a WLSMV and a Bayes estimator are being used at various points in time. Is the model first estimated with WLSMV and then somehow transitioned to a Bayesian analysis? When I try to create an H0 model that mimics sequential regression with a WLSMV estimator, I'm asked to use the Theta parameterization, but the default H1 model output claims to use Delta. When I try to use a Bayesian estimator for this H0 model, I am unable to reach convergence. Is it even possible to write the default H1 seq reg model as an M+ H0 model?

Thanks!

Tihomir Asparouhov posted on Tuesday, March 22, 2011 - 10:51 am

First let me say that unless you are using the older version Mplus 6, the default imputation model is not sequential. Starting with version 6.1 the default imputation model is COVARIANCE.

I think your confusion about what is happening stems from the fact that you have a model (and if you don't specify an estimator you are essentially using the default WLSMV estimator) and a data imputation statement. In this case, Mplus assumes that you want the WLSMV estimator for your estimation, but you want to deal with the missing data via multiple imputations. Therefore Mplus will perform Bayes estimation first to impute the missing data, then analyze the imputed data using the WLSMV estimator.

To simplify the methodology I would suggest that as a first step you perform the imputation and estimation separately. To get only imputation specify type=basic in the analysis command and remove the model statement. This will just generate imputed data, which you can later analyze as in the example on page 348 in the User's Guide.

Now if you are interested in H0 imputations, follow example 11.7, i.e., you have to specify estimator=Bayes, an imputation model, and the data imputation statement. To mimic the sequential regression imputation as an H0 model imputation the first thing to do is to specify the command
MEDIATOR = OBSERVED;
in the analysis command.

Tihomir Asparouhov posted on Tuesday, March 22, 2011 - 10:55 am

Regarding parametrizations, any Bayes or Imputation estimation is based on the Theta parameterization. On the other hand, with the WLSMV estimator generally both the Theta and the Delta parameterizations are available and can be used, however, for some models only the Theta parameterization is available and the sequential model sometimes is such a model.

David Bard posted on Tuesday, March 22, 2011 - 4:28 pm

You are right, I have not yet upgraded to 6.1, but will do so shortly. The WLSMV is listed as an Estimator in my file with or without a model statement (under 'Summary of Analysis' section of the output), but sounds like this simply reflects the default estimator were I to have included a model. Thanks for clarifying.

I do want the seq reg imputation in this instance. Any advice on getting my H0 version of this off the ground. Do I need to include fairly accurate starting values? I think van Buuren and Ibrahim have commented that the sequence of variable regressions can matter. Can you share the M+ default for setting up these seq regression equations when type=basic? Are the data restructured first to appear roughly like Monotone missingness?

Tihomir Asparouhov posted on Tuesday, March 22, 2011 - 4:56 pm

David

I am not very clear why you want the imputation with the sequential method. Version 6.1 has a better method already - the covariance model. Second I am not sure why you are not using the H1 imputation which is already preset for you in terms of optimal performance, just use type=basic, estimator=Bayes and add the data imputation statement, add model=sequential. Mplus does not reorder the variables, we use the order specified in the usevar command. We have not seen examples where the order of the variables is important.

Finally if you want to do an H0 imputation and you specified the MEDIATOR = OBSERVED; as well as the model and you are experiencing convergence problems I would suggest that you send it to support@statmodel.com

Tihomir

K�tlin Peets posted on Monday, March 28, 2011 - 5:57 am

I would like to use observed classroom-level means in my analyses. However, some individuals (in some classrooms) have missing values, and thus classroom means would be calculated on the basis of those individuals who don't have missing values (these individual scores are imputed later on though).
Is that problematic? The problem is that I would like to create interaction terms between classroom-level means and include them in the model when imputing the rest of the data.

Linda K. Muthen posted on Monday, March 28, 2011 - 10:20 am

This should not be a problem.

Michael Green posted on Wednesday, April 06, 2011 - 1:56 am

Can the SAVEDATA command be used to control the output file format for the imputed files from DATA IMPUTATION?

Thanks, MG

K�tlin Peets posted on Wednesday, April 06, 2011 - 7:54 am

I am conducting simple slope analyses (to follow up my interactions). I have imputed my data (20 data sets). I need to use covariances and variances of my parameter estimates. Do I need to hand calculate the averages of variances-covariances across 20 data sets (from output 3), or is there an easier solution (I guess I could use squared standard errors from the model output to get the variances of the parameter estimates)?

Linda K. Muthen posted on Wednesday, April 06, 2011 - 12:10 pm

Michael:

The format of imputed data sets cannot be changed. If the original data is in fixed format, the data saving for imputation will use the format of the original data. But if it is free format, then it uses the default of F10.3.

Linda K. Muthen posted on Wednesday, April 06, 2011 - 12:12 pm

Katlin:

TECH3 is available with TYPE=IMPUTATION.

K�tlin Peets posted on Wednesday, April 06, 2011 - 12:25 pm

Yes, I do use Tech3, but I get 20 variance-covariance matrices (as I have 20 imputed data sets). So, for instance, when I need a covariance value between the intercept and moderator, do I need to calculate the average covariance across the data sets? I assume this is what I need to do as there isn't such a "summary" matrix across the data sets. Or am I wrong?

Linda K. Muthen posted on Wednesday, April 06, 2011 - 1:49 pm

Averaging over TECH3 is not correct. You can square the standard error of the variance parameter but that does not get you everything you need. Perhaps you can use MODEL CONSTRAINT to do what you want.

K�tlin Peets posted on Wednesday, April 06, 2011 - 2:09 pm

Could you be more specific (I would really appreciate your help)? How would I get covariance estimates when using MODEL CONSTRAINT?

Linda K. Muthen posted on Wednesday, April 06, 2011 - 2:12 pm

I'm not saying you would get the covariance estimates from MODEL CONSTRAINT. Perhaps you can define whatever it is you want a standard error for in MODEL CONSTRAINT. You would then obtain a standard error. Other than that, I have no suggestions. See MODEL CONSTRAINT in the user's guide for further information.

Yijie Wang posted on Friday, April 22, 2011 - 9:18 am

Hello,

I'm doing a multiple imputation and want to use the generated data for further analyses. Is there a way for mplus to combine all the imputed datasets and yield an averaged dataset? Thank you!

Linda K. Muthen posted on Friday, April 22, 2011 - 10:55 am

No. We produce the individual data sets only.

Michael Green posted on Wednesday, May 18, 2011 - 2:44 am

Hi,

I understand that interaction terms should be included in an imputation model.

When using the unrestricted H1 imputation option, should the interaction terms themselves be imputed along with the variables from which they are derived (which would lead to interaction terms which are not the exact product their source variables), or should the interaction terms be used only as predictors in the imputation, and then re-calculated from the imputed data during analysis?

Best, MG

Linda K. Muthen posted on Wednesday, May 18, 2011 - 9:48 am

We would not include the interaction term in the imputation of the data. We would use it only in the subsequent analysis.

Peren Ozturan posted on Friday, May 27, 2011 - 5:48 pm

Hi,
Do you recommend running latent factor interaction models on multiply imputed data?
Thanks.

Bengt O. Muthen posted on Saturday, May 28, 2011 - 5:45 am

Do you mean creating factor scores via plausible values and then creating interactions? Or do you mean imputing missing values on observed variables and then doing XWITH? The former is an interesting idea that should be explored. The latter is straightforward.

Peren Ozturan posted on Saturday, May 28, 2011 - 9:07 am

I was actually asking about the latter but was confused about the post dated March 30, 2010 - 12:42 pm and Linda's related answer.
1. My structural model is composed of latent factor interactions. Should I have included XWITH while running my imputation model? Or it is fine to run imputation by modelling main effects and then specify XWITH while runnning the structural model on imputed data?
2. Is there a way to go around the two-step approach regarding the use of multiple imputation (i.e. first impute data, then estimate structural model)? Can't we do them simultaneously?
3. Is multiple group analysis on imputed data straightforward, as well? If so, then I am able to test whether grouping improves model fit by comparing models' loglikelihoods, right? I am asking this because in output, we get message "the loglikelihood cannot be used directly for chi-square testing with imputed data" but post dated March 19, 2007 - 5:59 pm states they could be used.

Bengt O. Muthen posted on Saturday, May 28, 2011 - 10:06 am

1. The imputation model does not have to be correct relative to the analysis model, but how large the deviation can be depends on the situation. So with l.v. interactions, your analysis model contains them, and the imputation may or may not use them. My current thinking is that the imputation is probably good enough without using them. I am not sure if our H1 (unrestricted) imputation has difficulty converging when having both the l.v.'s and their interactions.

2. Yes, you can do it in one run by specifying estimator = ML/WLSMV (but not Bayes).

3. Yes, multiple-group imputation can be done. We provide a chi-square test suitable for multiple imputed data - see the technical appendix Chi-Square Statistics with Multiple Imputation. I am not sure this can handle chi-square difference testing, however.

A new version of the Topic 9 handout including an expanded discussion of multiple imputation taught at UConn last week will be posted next week.

Sofie Henschel posted on Monday, June 06, 2011 - 4:06 am

Dear Linda and Bengt,
I`m trying to run a multigroup sem with imputed data. However, in order to get the model running in all imputed data sets, I need to restrict the residual variances of one my variables to be equal. Running the model separately in each imputation data set does not require this restriction. I am now wondering why I need restrictions in the overall model when I don`t need to restrict the residual variances in each of the single data sets. Is it right that, when using imputed data, Mplus runs each data set separately and then combines the results using Rubins� formula? And if so, why do I need special restrictions with the imputed data set?
Thanks in advance!
Sofie

Linda K. Muthen posted on Monday, June 06, 2011 - 7:09 am

This does not make sense. If you are not using Version 6.11, try that. If you are and still have the problem, please send the files and your license number to support@statmodel.com.

Eric Teman posted on Sunday, June 26, 2011 - 6:52 pm

I have read in several places that multiple imputation has no set rules in regard to pooling likelihood ratio chi-square values or adjunct fit indexes. Does Mplus have a special way of handling this?

Linda K. Muthen posted on Sunday, June 26, 2011 - 7:10 pm

The following technical appendix on the website describes our ML chi-square for multiple imputation:

Chi-Square Statistics with Multiple Imputation

For all other fit statistics we give the average over imputations.

Eric Teman posted on Monday, June 27, 2011 - 6:42 pm

Are you aware of any issues when taking the average over imputations for fit statistics? I'm just wondering whether Enders' concern is warranted about "no rules exist for combining fit indices from multiply imputed datasets"

Linda K. Muthen posted on Monday, June 27, 2011 - 8:08 pm

The averages are not correct. See the Technical Appendix to see the difference between the average and the correct chi-square.

Eric Teman posted on Monday, June 27, 2011 - 8:12 pm

Sorry, I was referring to the adjunct fit indexes. Are there any known problems/issues with those averages?

Linda K. Muthen posted on Tuesday, June 28, 2011 - 5:40 am

All of the averages for RMSEA etc. are simply averages. Only the ML chi-square is correct for imputation. Chi-square for weighted least squares is also given as an average and is not to be interpreted for model fit.

Juanjo Medina posted on Wednesday, June 29, 2011 - 12:07 pm

Hiya,

I'm trying to run a multiple imputation model but experiencing some problems. Mplus stops at the 12500 iteration because of lack of memory. I'm running the program in Windows32 bits, with a dual processor 2.3g (using processor=2), 2.2g of ram and all non essential processes (even antivirus) terminated. I found the 2010, version 2 Aparouhov & Muthen paper, where they recommend the use of the FBITER and THIN option (the latter is also suggested by my output). Yet, when I try to use the FBITER option in my 6.11 version of Mplus I'm told this function is unrecognised. Any help with this is welcome.

Linda K. Muthen posted on Wednesday, June 29, 2011 - 2:44 pm

Please send your files and license number to support@statmodel.com.

Eric Teman posted on Wednesday, June 29, 2011 - 4:16 pm

Are you aware of any published research indicating taking the averages of adjunct fit statistics across imputations is not correct?

Linda K. Muthen posted on Wednesday, June 29, 2011 - 4:49 pm

I don't know of anything specifically. You might look at Craig Ender's book and Joe Schafer's book. Both references should be in the Topic 9 course handout.

Tihomir Asparouhov posted on Thursday, June 30, 2011 - 8:42 am

I am not sure what adjunct fit statistics is but in general the chi-square statistics should not be added directly. See

http://statmodel.com/download/MI7.pdf

for simulations and description of how Mplus does this. Also any approximate fit indices based on the correct chi-square statistic should be valid.

Brondeel Ruben posted on Thursday, September 15, 2011 - 3:14 am

Hi,

I would like to use the variance-covariance matrix of the coefficients to make some plots. I can export the matrices for each imputation (tech3). How can I summarize the 10 matrices into 1?
I understand that I could use the constrain command, but for all covariances, this means a whole lot of input.
Are the means over the 10 matrices a good approximation? or the median? It doesn't have to be 100% correct, since it's only for plots. It just can't be too variable to the used dataset.

Greetings,
Ruben.

Linda K. Muthen posted on Thursday, September 15, 2011 - 11:37 am

With TYPE=IMPUTATION, you will get a correct TECH3. This is what you should use.

Sierra Bainter posted on Thursday, September 22, 2011 - 10:52 am

Hi,

I am imputing a single categorical variable using a number of completely observed variables in my data set. Do I have to include completely observed categorical variables on the Categorical statement?

Linda K. Muthen posted on Thursday, September 22, 2011 - 2:08 pm

Any categorical variable on the IMPUTE list should have (c) after it. The CATEGORICAL option is for the analysis not the imputation of the data. All dependent variables in the analysis should be on the CATEGORICAL list.

Mauricio Garnier-Villarreal posted on Tuesday, November 22, 2011 - 8:56 pm

Hi,

I execute and save multiple imputations with Mplus, but when I analyzed the list of the data sets Mplus doesn't estimate the fit indices and report a negative variance in every data set (FILE IS TESIMPlist.dat;
TYPE = IMPUTATION;). When I analyzed each data set by itself works fine and doesn't report the negative variance.

Why is that Mplus is not working properly with list?

thanks

Mauricio

Linda K. Muthen posted on Wednesday, November 23, 2011 - 1:39 pm

If you are not using Version 6.12, do. If you are, please send the relevant files and your license number to support@statmodel.com.

Eric Teman posted on Tuesday, January 17, 2012 - 1:36 pm

During the analysis phase of multiple imputation, is it possible for Mplus to save the averaged parameter estimates (and the corrected chi-square) as a data file? When I use SAVEDATA: RESULTS ARE results.dat, I get the un-averaged parameter estimates for the NDATASETS, which means no corrected chi-square is being saved.

Linda K. Muthen posted on Wednesday, January 18, 2012 - 3:52 pm

We do not save the averaged results. We save the results from each imputation. The average results are given in the results section.

Eric Teman posted on Friday, January 27, 2012 - 7:20 pm

When fixing the latent variances to one so that all factor loading can be estimated, is it normal for WLSMV used with multiple imputation to produce negative factor loadings? Is this OK?

Linda K. Muthen posted on Friday, January 27, 2012 - 7:50 pm

Factor loadings can be positive or negative. They are regression coefficients.

Eric Teman posted on Friday, January 27, 2012 - 8:20 pm

Sorry, I should have been more clear. It is a simulation study where I have set the population values to be positive. But when I employ multiple imputation, the factor loadings are often negative when the latent variances are fixed to 1, but never negative when the latent variances are free. It seems a bit odd.

Bengt O. Muthen posted on Saturday, January 28, 2012 - 11:20 am

Perhaps what you see is that all the factor loadings for a certain factor change sign to negative. That is ok and simply means that your factor is reversed (say from knowledge to ignorance). That gives the same fit. You often see this sign reversal in EFA. It is harmless.

When you set the metric by fixing a loading to 1 you effectively decide on the sign.

Isabella Lanza posted on Thursday, February 02, 2012 - 1:55 pm

I have a question about imputing interactions. Initially I thought I should just impute my main variable, and then aggregate my imputed datasets to calculate interactions based off the main variable of interest. However, reading over the literature (Von Hippel 2009 - transform then impute) and how Mplus derives results from multiple imputed datasets, I realized that I should include interactions in the imputation procedure. Ok this all makes sense, but I am having trouble with model convergence. I've increased the iterations and deleted variables from the USE VARIABLE command and it still hasn't solved the problem. Is there any way this problem could be related to the fact that I am asking for standardized interactions? Thanks for any input.

Linda K. Muthen posted on Friday, February 03, 2012 - 8:59 am

Try imputing without the interactions and see if that works.

Maarten Pinxten posted on Monday, February 06, 2012 - 4:42 am

Are variables included in the NAMES list automatically used to impute missing data or do I have to define them explicitly as auxiliary variables? Thank you very much!

Linda K. Muthen posted on Monday, February 06, 2012 - 1:53 pm

If you do not have a USEVARIABLES list, all variables on the NAMES list are used to impute data for the variables on the IMPUTE list.

Maarten Pinxten posted on Tuesday, February 07, 2012 - 1:14 am

Thanks for the reply! Using the VALUES command I specified the range of the imputed values for each variable (minimum and maximum) but an inspection of the imputed data sets shows that for some imputations the values exceed those restrictions. Any idea how this is possible? I would expect non-convergence if the mcmc algorithm can't find a value between the specified range after x number of iterations... Do I need to worry about this (percentage of missingness 17%)?

Linda K. Muthen posted on Tuesday, February 07, 2012 - 7:17 am

Please send the output, a data set that shows this, and your license number to support@statmodel.com.

Maarten Pinxten posted on Monday, February 20, 2012 - 1:35 am

An additional question: How can I let Mplus know that it should not use SCHOOLID as an covariate to impute the requested variables? How can I include my schoolID in the imputed datasets?

When I use the 'Cluster' option in the NAMES command (with TYPE=COMPLEX), Mplus computes all the requested datasets but in the output it shows the error message 'all variables are uncorrelated with all other variables'.

When I just run the same MI-model without SCHOOLID included in the input file, my model just runs perfectly.

When I ran the same input datafile but than with the SCHOOLID included (in combination with the USEVARIABLE command in the input syntax) my SCHOOLID is not shown in the imputed datasets.

I guess there's a simple solution but I can't figure it out. Thank you very much!

Linda K. Muthen posted on Monday, February 20, 2012 - 8:26 am

Use the IDVARIABLE option of the VARIABLE command.

Andre Plamondon posted on Wednesday, February 22, 2012 - 10:43 am

When using the "values=" option with multiple imputation, is it possible to specify a range of values in which negative values are possible?

Linda K. Muthen posted on Wednesday, February 22, 2012 - 3:49 pm

We do not currently allow negative values but will do so in the next version. The workaround for this is to add a constant to your variable that makes all numbers positive, impute, and then subtract the constant.

Aurora Zhao posted on Thursday, March 08, 2012 - 5:44 pm

Hi Dr. Muthen,

I am a beginner of handling missing data with multiple imputation. I am looking at the example 11.5. I am wondering how to calculate this missing data correlate "z" from the original data and save it into the data set to do M.I. Thank you very much!

Linda K. Muthen posted on Friday, March 09, 2012 - 3:54 pm

Z is not a variable that you create. It is part of the dataset that is used to impute the variables on the IMPUTE list.

Owis Eilayyan posted on Wednesday, April 04, 2012 - 6:13 am

Hello,
I tried to do multiple imputation using the following command, but i couldnt find the saved output file. could you please tell me where is the output file saved?

TITLE: this is an example of multiple imputation
for a set of variables with missing values
DATA: FILE IS C:\Users\Admin\Desktop\MGH\Owis\path3.dat;
VARIABLE: NAMES ARE smo em sym act emo env pf ef sf re bmi age fev gender act5 actc;
missing = .;
DATA IMPUTATION:
IMPUTE = smo (c) em (c) sym pf ef sf re bmi age fev gender (c) act5 (c) actc (c);
NDATASETS = 10;
SAVE = C:\Users\Admin\Desktop\MGH\Owis\essra.dat;
ANALYSIS: TYPE = BASIC;
OUTPUT: TECH8

Linda K. Muthen posted on Wednesday, April 04, 2012 - 6:29 am

The saved data set name should be essra*.dat. The asterisk is replaced by the numbers of the datasets, for example, essra1.dat, essra2,dat etc.

Owis Eilayyan posted on Wednesday, April 04, 2012 - 6:34 am

I did that but i got this error message:

*** ERROR in Data command
The file specified for the FILE option cannot be found. Check that this
file exists: C:\Users\Admin\Desktop\MGH\Owis\essra1.dat

Linda K. Muthen posted on Wednesday, April 04, 2012 - 8:14 am

Please send the relevant files and your license number to support@statmodel.com.

finnigan posted on Friday, April 13, 2012 - 10:37 am

Linda

I have a longitudinal data set where indicators have 20-30% missing data across three waves
Covariates have 1-5% missing data across three waves.

I will be conducting measurement invariance using CFA and then estimating a multiple indicator growth model.
I am following two solutions an FIML and a multipe imputation to handle the 20-30%
To follow Multiple imputation should the model used to generate the data sets be the CFA or the growth model?

Thanks

Linda K. Muthen posted on Saturday, April 14, 2012 - 9:15 am

I would impute according to the H1 model.

Lindsay Bell posted on Tuesday, May 08, 2012 - 8:25 pm

Hi -

I have a few questions about multiple imputation with a multilevel model. First, I am finding high autocorrelations lasting for many iterations for the between-level parameters. Do you know if this is normal?

Second, is there a way to get Mplus to show me the autocorrelations for more than 30 lags?

Third, I don't quite understand where Mplus draws the imputed data sets. If I specify THIN=500 in the data imputation command, then is Mplus drawing the imputed values from every 500th iteration, beginning with the first iteration after burn-in (i.e., 1st data set has values from the 10,000th, 2nd from the 10,500th, and so on)?

Finally, in the imputation, I want Mplus to take into account the fact that certain values (i.e., family socioeconomic status) tend to be similar within schools. I have specified it like this:

CLUSTER=schoolid;

ANALYSIS:
TYPE = TWOLEVEL;

MODEL:
%WITHIN%
ses ON par_ed h_income;

%BETWEEN%
ses;

Does this accomplish my goal of accounting for the school-level clustering of values on that variable? If not, can you tell me how I can?

Thank you,
Lindsay

Tihomir Asparouhov posted on Wednesday, May 09, 2012 - 10:56 am

1. It is normal to have high autocorrelations in two-level imputation because especially when the number of clusters is not far from the number of variables (this leads tyo nearly singular variance covariance matrix on the between level). We basically would recommend and H0 model imputation with a 1 factor analysis model on the between level. Take a look at sections 3.3 and 3.4 in

http://statmodel.com/download/Imputations7.pdf

2. You cannot get more than 30 autocorrelations. You have to use the thin command to discard MCMC draws - that would let you see how correlated more distant draws are. For example if using thin=50, the 50-th autocorrelation will become the first.

3. The thin option in the data imputation command woks as you describe above.

4. First you should make sure that Mplus does what you think it does - look at slide 184 in
http://statmodel.com/download/Topic9-v52%20%5BCompatibility%20Mode%5D.pdf

By default all variable in Mplus are present on both levels within and between that accounts for the similarity of SES within clusters (The command that restrict that default are within= and between=).

Lindsay Bell posted on Wednesday, May 09, 2012 - 12:35 pm

Thank you for your reply. Just to make sure I understand - if I specify

ANALYSIS:
TYPE = BASIC TWOLEVEL;

then the similarity of variables within clusters is accounted for, unless I list then as WITHIN.

If I want to specify an H1 model on the within level and an HO model on the between level, do I just not specify any model for the within level? If variables are listed as part of the between-level H0 model, is their cluster-level similarity still accounted for?

Thank you,
Lindsay

Lindsay Bell posted on Wednesday, May 09, 2012 - 3:46 pm

Sorry, a couple more questions - how do I evaluate the model when using TYPE = TWOLEVEL BASIC? The program isn't giving me the Bayesian plots, so I don't know how to assess whether the estimates reached a stable pattern or if there is an issue with autocorrelation.

Also, the model is converging and not giving me any error messages even when there are more between-level variables than there are clusters. Can I be comfortable with the results?

Thank you very much,
Lindsay

istia posted on Thursday, May 10, 2012 - 9:06 am

what's exactly the role of random value chi square for regression model or even multiple imputatation? anybody can share some papers? thanks before

Tihomir Asparouhov posted on Thursday, May 10, 2012 - 11:51 am

Lindsay

If you have more variables than clusters you should be using the H0 imputation method (right most path in diagram on slide 184). Like this

TYPE = TWOLEVEL; estimator=bayes;

model
%within%
y1-y100 with y1-y100;
%between%
y1-y100;

Add the data imputation command.

data imputation:
impute=y1-y100;
save=imputations*.dat;

Linda K. Muthen posted on Thursday, May 10, 2012 - 1:21 pm

Istia:

What is "random value chi square".

Lindsay Bell posted on Thursday, May 10, 2012 - 4:25 pm

Tihomir - Thank you so much for your reply with the example syntax. That is very helpful. One follow-up question: does it make a difference that a few of the variables are observed at the between level and have no within-cluster variance? Would that change the imputation syntax at all?

Thank you,
Lindsay

istia posted on Thursday, May 10, 2012 - 9:11 pm

Linda - Sorry if i was wrong or understading it. But what i've read from here:
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_multiple_imputation_univariate_linear.htm

is there will be a random value 'u' of Chi Square. I just can't get it what is it exactly mean.
But what's the influence for using this Chi Square value to produce some kind of regression value or imputation value I mean?

Tihomir Asparouhov posted on Friday, May 11, 2012 - 9:03 am

Lindsay

You should specify those variables on the between= list in the variable command.

Tihomir Asparouhov posted on Friday, May 11, 2012 - 9:04 am

istia

Take a look at

http://statmodel.com/download/MI7.pdf

Lindsay Bell posted on Friday, May 11, 2012 - 9:06 am

Ok, thank you, I will do that. Just to be clear, though, the model syntax:

model
%within%
y1-y100 with y1-y100;
%between%
y1-y100;

will be exactly the same, even if variables y95-y100 are between? It just seems strange to me to have variables that are between variables as part of the within section in the model.

Thank you,
Lindsay

Bengt O. Muthen posted on Friday, May 11, 2012 - 9:35 am

Istia -

Regarding "random value 'u' of Chi Square", I think you should ignore that. That's just the way they describe the chi-square testing. The way we describe the chi-square testing is in the document Tihomir pointed to.

If you want to learn moe about multiple imputation, I would recommend the 2010 book by Craig Enders. It refers to Mplus.

Lindsay Bell posted on Friday, May 11, 2012 - 9:51 am

As a follow-up, I just tried the syntax, and got the error message "between variables cannot be used on the within level."

Instead I tried:

model
%within%
y1-y94 with y1-y94;
%between%
y1-y100 with y1-y100;

But this model is not converging, I'm guessing because there are too many parameters relative to the number of clusters. Perhaps instead I should try this within syntax and the between-level analysis model with reference to the between-level variances of other variables? i.e.,

model:
%within%
y1-y94 with y1-y94;

%between%
y1-y91;
y92 ON y95 y96 y97 y98 y99 y100;
y93 ON y95 y96 y97 y98 y99 y100;
y94 ON y95 y96 y97 y98 y99 y100;

I really appreciate your guidance.

Lindsay

Lindsay Bell posted on Friday, May 11, 2012 - 10:16 am

I just tried the syntax with the between-level analysis model specified:

model:
%within%
y1-y94 with y1-y94;

%between%
y1-y91;
y92 ON y95 y96 y97 y98 y99 y100;
y93 ON y95 y96 y97 y98 y99 y100;
y94 ON y95 y96 y97 y98 y99 y100;

and it converged very well. Everything looks good except that the between-level variance parameters for all the variables except y92, y93, and y94 have very high autocorrelations. Does this indicate a problem with the imputation model, or do I just need to increase the thinning until the autocorrelation drops to near zero?

Thank you again,
Lindsay

Tihomir Asparouhov posted on Friday, May 11, 2012 - 11:17 am

You don't need to have the autocorrelation drop to 0.Instead you can aim for the 30-th autocorrelation to be below 0.2 or 0.3. Try thin=10 or even 50 or 100.

Laura Baams posted on Monday, May 21, 2012 - 1:16 pm

Hi,

We have a question about planned missingness and FIML. We have a large dataset (N = 1400) and have worked with the three form design for a large part of a large questionnaire. This means we have a lot of missings, but only MCAR.

We plan on using FIML to deal with the missings, but have noticed that Mplus does not give all fit statistics. We do not get the RMSEA, CFI, TLI, Chi.

We were told that currently Mplus has no way of reliably estimating these fit statistics, and therefore does not give them. Is this indeed the case? Do you know of any papers that discuss this issue? And is there away around this issue without using MI?

Thanks for any tips or advice!

Mauricio Garnier-Villarreal posted on Monday, May 21, 2012 - 2:16 pm

Hi

I am running a multiple imputation in mplus, but i have run into the problem that the data set is big (1119 variables, arround 5000 subjects) and mplus tells me that i can not include more than 500 variables. I can get the imputation to run when I select some variables to be imputed with the USEVARIABLES command. But, when i do this am i excluding all the other variables from the imputation process?

how can i impute big data sets in mplus?

thank you

Bengt O. Muthen posted on Monday, May 21, 2012 - 8:42 pm

Laura,

You don't have to use MI (multiple imputation) which it sounds like you are doing given that you don't get all fit indices (which haven't been statistically developed yet). With missing by design you might instead want to use multiple-group ML analysis, with groups corresponding to the three forms.

A good applied source for missing data handling is the C. Enders 2010 book.

Bengt O. Muthen posted on Monday, May 21, 2012 - 8:49 pm

Mauricio,

There are 3 lists of variables: The NAMES list which describes the data (it can contain more than 500 variables), the USEV list which are the variables that inform the imputations, and the IMPUTE list which says which subset of variables we want to have imputations from.

Typically, your USEV list is much shorter than your NAMES list. You don't need all the NAMES list variables to inform the multiple imputations, but usually a very short list of variables. The IMPUTE list should contain a shorter list of the variables for which you want to do a particular analysis. So 500 USEV variables would seem to be more than enough.

Lindsay Bell posted on Friday, June 01, 2012 - 7:45 am

Hi -

I am using 20 imputed data sets to do a two-level analysis. I've discovered that under certain circumstances, the program is not analyzing all of the data sets (the output says requested: 20, completed: 19).

One scenario in which this is happening is with a between-level dichotomous variable that was completely observed, and so is identical in every imputed data set. Can you help me figure out why that might be happening and what I can do to recover the 20th data set?

Thank you,
Lindsay

Linda K. Muthen posted on Friday, June 01, 2012 - 9:57 am

Add TECH9 to the OUTPUT command to see the reason.

Natalie Bohlmann posted on Wednesday, June 06, 2012 - 3:24 pm

Hello
I am having a similar issue to a previous post. I am using 10 imputed data sets, in a single level model. There are 177 cases in the data set, but the output says that the average number of observations is 141. The number of replicated requested is 10, but only 8 completed. My output also does not provide sample stats or a Chi-square test result.

I added Tech9. It says that the model terminated normally gives the warning: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX....PROBLEM INVOLVING PARAMETER 17.

Parameter 17 is the covariance of a variable with itself from the PSI matrix.

I examined each of the imputed files, all have complete data for all 177 cases. I've also confirmed that my implist.dat file lists all 10 data sets correctly named.

I am using version 6.0. I have a friend with the 6.11 update and asked her to run the model. The result is the same, her output has an addition message above the estimates for model fit "THE CHI-SQUARE COULD NOT BE COMPUTED. THIS MAY BE DUE TO AN INSUFFICIENT NUMBER OF IMPUTATIONS OR A LARGE AMOUNT OF MISSING DATA."

Her output does include sample statistic, but states that it is only for 8 data sets.

Please Advise. I appreciate your time and help.

Thank you,
Natalie

Linda K. Muthen posted on Wednesday, June 06, 2012 - 4:22 pm

Please send the relevant files and your license number to support@statmodel.com.

Mauricio Garnier-Villarreal posted on Tuesday, June 12, 2012 - 6:31 am

Hi

I have a question on how Mplus combine the results from multiple imputed data sets.

Does Mplus combine both unstandardized and standardized results? or just combine the unstandardized and standardized the combine results?

thank you

Linda K. Muthen posted on Tuesday, June 12, 2012 - 8:50 am

The average of the parameter estimates over the multiple imputations are standardized.

Steven De Laet posted on Tuesday, June 19, 2012 - 2:13 am

Hi,

We have a question about fit statistics (CFI, ...) when applying multiple imputation.
Because no formal pooling rules are currently available, Enders (2010)recommends that the 20 (or more) estimates of each fit index are used to create an empirical distribution.
I wonder how I'd best do that. Should I run my syntax on every imputed dataset and save each fit statistic manually or is there a more efficient way (perhaps a summary of the fit statistics of each dataset in one output)?

Thank you very much

Linda K. Muthen posted on Tuesday, June 19, 2012 - 1:58 pm

You would need to run your syntax on each imputed data set.

Chris Blanchard posted on Friday, June 22, 2012 - 10:12 am

Hi,
i'm running a latent class growth analysis on 5 time points with missing data. I've been able to use the multiple imputation procedure nicely to establish there are 2 classes, however, I want to save class membership (i.e., cprobabilities) from this analysis and MPlus is telling me i can't. Is there a way I can get the class membership for each patient exported from MPLus when using the multiple imputation method? (please note there are 5 data sets)

chris

Bengt O. Muthen posted on Friday, June 22, 2012 - 8:38 pm

For cprobs you need an estimated model and data for the subject in question. You have the estimated model (the average over the imps), but which of the 5 data sets should you use? That's the problem. You can create cprobs for each data set using the average estimates (fixed parameter values when running with each data set), and then average the cprobs. It is not clear, however, that this is the best approach - we are at the research frontier here.

bkeller posted on Friday, July 27, 2012 - 1:25 pm

I'd like to determine the fraction of missing information (see, e.g., Snijders & Bosker, 2nd ed, p. 141) for the parameters in a TWOLEVEL RANDOM type analysis for a TYPE = IMPUTATION run. Is there a way to output the average within-data-set variance W.bar = 1/M*Sum(SE^2) and the between-imputation variance B = (1/(M-1))*Sum(theta.hat_m-theta.bar)^2 for each parameter so that I can calculate this?

Thank you!

Tihomir Asparouhov posted on Friday, July 27, 2012 - 3:28 pm

This will be included in the output in the next Mplus version but you can still compute that with the current version. If you have just a small number of imputation data sets just run all of them one at a time and you can get all the parameter estimates and compute this by hand. If you have many imputed data sets use the External Montecarlo technique described in the Users Guide EXAMPLE 12.6 STEP 2. This way you can get the between imputation variance (column 3) and the average SE (column 4).

bkeller posted on Monday, August 06, 2012 - 1:39 pm

I used the MONTECARLO technique you described to get B and W.bar, thank you. I am further interested in saving as output an array which contains the actual estimated parameter values across imputations. I am using 25 imputed datasets so I would rather not run each one and compile by hand. Is there a way to ask Mplus to save the results (something similar to SAVEDATA: ESTIMATES = file.dat;) but for all 25?

Tihomir Asparouhov posted on Tuesday, August 07, 2012 - 12:34 pm

Use
savedata: results=file.dat;

EFried posted on Thursday, August 09, 2012 - 12:45 pm

I'm using multiple imputation and am running a multilevel growth model.

In my model results, baseline covariates are shown to be correlated by 0.000 (which is not actually the case in the dataset). The correlations between time-varying covariates are, in contrast, shown as expected.

All other model results look fine as well.

My question is whether the zero correlations have something to do with the way MPLUS deals with multiple imputation files � or whether something is wrong with the analysis?

Thank you

Linda K. Muthen posted on Thursday, August 09, 2012 - 2:04 pm

Please send the output and your license number to support@statmodel.com so I can see what you are doing.

Maren Formazin posted on Thursday, September 06, 2012 - 4:13 am

Hi,
I have ten multiple imputation data sets which is my data for CFA. Since my data is categorical, I have used WLSMV and specified the variables as categorical.

For Chi-Square, there is the following information:

Chi-Square Test of Model Fit

Number of successful computations 10

Proportions Percentiles
Expected Observed Expected Observed
0.990 1.000 0.020 39.184
0.980 1.000 0.040 39.184
0.950 1.000 0.103 39.184
0.900 1.000 0.211 39.184
0.800 1.000 0.446 39.184
0.700 1.000 0.713 40.065
0.500 1.000 1.386 53.839
0.300 1.000 2.408 62.245
0.200 1.000 3.219 68.881
0.100 1.000 4.605 105.435
0.050 1.000 5.991 105.435
0.020 1.000 7.824 105.435
0.010 1.000 9.210 105.435

What does this output mean? With ML, there is just one Chi-square value - here, there is none.

Thank you very much for your help!

Maren Formazin posted on Thursday, September 06, 2012 - 4:14 am

Hi,

and an additional question:

Are the mean of CFI / TLI / RMSEA / WRMR the pooled results over all 10 datasets? What do "proportions" and "percentiles" of these indices mean?

Thank you!

Linda K. Muthen posted on Thursday, September 06, 2012 - 5:58 am

The only fit statistic that has been developed for multiple imputation is chi-square. For the others we give the average over the imputed data sets. The output is described on page 362 of the user's guide.

Maren Formazin posted on Friday, September 07, 2012 - 12:44 am

Thanks for your reply, Linda.
I have checked my output again - there are NO mean or standard deviation, just the information I've posted above ("10 successful computations" and the table with expected and observed percentiles and proportions). Do I need to specify something more?
I do get mean and SD für CFI, RMSEA, TLI and WRMR though.

On the other hand, when using ML with the same imputed data, there is just one chi-square-value (and CFI, RMSEA etc.), but no table with expected and observed proportions and percentiles. Why does this happen?

Thanks!

Linda K. Muthen posted on Friday, September 07, 2012 - 5:48 am

Please send the two outputs and your license number to support@statmodel.com so I can see what you are looking at.

Caroline Vancraeyveldt posted on Friday, November 09, 2012 - 6:47 am

Dear Dr. Muth�n,

I am doing multiple imputation for four variables in a latent growth model. When I request 20 replications, only 19 are completed. I requested TECH9, but apparently only the first repliction did not yield convergence and that the number of iterations was exceed (no reason why was stated in TECH9). Could it be that one of the variables has too many missings to do imputation? (data of 83 out of 175 are missing for this variable). If I do not include this variable in the missing analysis, than all the iterations are correctly completed.

Thank you for your response!

Linda K. Muthen posted on Friday, November 09, 2012 - 11:26 am

Please send the output, data, and your license number to support@statmodel.com.

Sofie Henschel posted on Tuesday, November 13, 2012 - 10:11 am

Hi,
I am running a multiple imputation with continous variables. I used the rounding option and requested 5 decimals for the imputed variables because my original values have 5 decimals too.

ROUNDING = sach lit (5);

unfortunately mplus gives only the default of 3 decimals for the imputed data and reduces the not imputed values to 3 decimals. have you any idea what's wrong here?
thanks in advance
Sofie

Tihomir Asparouhov posted on Tuesday, November 13, 2012 - 2:01 pm

Add

savedata: Format=F10.5;

Sofie Henschel posted on Wednesday, November 14, 2012 - 1:20 am

Thanks for your help, but mplus says that doesn't work for multiple imputation.

*** WARNING in SAVEDATA command
The FORMAT option for saving data is not available for TYPE=MONTECARLO or multiple imputation. The FORMAT option will be ignored.

Have you any other suggestion? Thanks
Sofie

Kofan Lee posted on Wednesday, November 14, 2012 - 7:23 am

I am using multiple imputation in CFAs. I use CFA to check theoretical constructs and for the preparation of item parceling. I have confronted some questions, and I wonder if I have something wrong with syntax.
It seems like such combination (CFA with impute data) brings some limitations. For instance, the survey data I have is non-normally distributed, However, I cannot use SAVE=MAHA and tech13 to detect multivariate outliers and multivariate normality. Further, modification indexs cannot be administered too. Also, chi-square test in using MLR estimation does not generate the output of significant test. Would you mind taking a look at my command as followed:

Title: CFA motivation using imputated data;
Data: File = C:\Users\koflee\Desktop\111312\serious.imputelist.dat;
Type = imputation;
VARIABLE:
NAMES ARE m1-m19 s20-s37;
USEVARIABLES ARE m1 m2 m4 m6-m19;
ANALYSIS: ESTIMATOR = MLR;
MODEL: im by m4 m10 m15 m18;
id by m8 m14 m17;
intro by m2 m7 m13;
em by m1 m6 m11 m16;
am by m9 m12 m19;
SAVEDATA: save=maha;
OUTPUT: TECH4 tech13 STANDARDIZED RESIDUAL MODINDICES (0);

Thank you so much
Kofan

Tihomir Asparouhov posted on Wednesday, November 14, 2012 - 9:04 am

Sofie

It should have worked. Are you using type=basic? Send your example to support@statmodel.com

Tihomir

Linda K. Muthen posted on Wednesday, November 14, 2012 - 12:56 pm

Kofan:

Some things have not yet been developed for multiple imputation, for example, only the ML estimator has been developed. For other estimators we give the mean and other information.

Kofan Lee posted on Thursday, November 15, 2012 - 6:11 am

Linda,

Thanks for the response. I actually have very few missing responses and decide to delete those cases. I have a question about Mahalanobis D. I try to run a CFA (ML estimation) with the following code:
SAVEDATA: save=maha;
However, this command is ignored by MPlus. Should I add something in this syntax?

Thank you and have a wonderful day

Kofan

Linda K. Muthen posted on Thursday, November 15, 2012 - 9:05 am

You also need to name a file using the FILE option.

Kofan Lee posted on Saturday, November 17, 2012 - 6:02 am

Linda,

That works. Thank you

Jan Zirk posted on Thursday, November 29, 2012 - 5:13 pm

Dear Bengt or Linda,
I would like to ask you about imputation in context of Bayesian plausible values. In a dataset with a big sample size (n>10000) there are many ordered-categorical variables from 5 instruments + demographic measures. To decrease computational demand I would like to transform the ordered-categoricals to continuous plausible value measures. Is it better to do this via one big "H1 model" (ie, "** with **" where ** means all variables in the dataset) or would it better to run 5 separate H1 models (separate for each instrument's categoricals)?

Best wishes,
Jan

Jan Zirk posted on Thursday, November 29, 2012 - 5:15 pm

P.S. I would like to next run SEMs with all measures entered.

Bengt O. Muthen posted on Friday, November 30, 2012 - 7:55 am

It sounds like you are putting a factor behind each ordered-categorical variable. If you can do it with all the variables from all 5 instruments that would be best assuming they are at least moderately correlated. But if that gives you too many variables, then assuming you have enough variables within instrument doing it instrument-wise would seem ok too. I guess any combination of the different sets of plausible values for the 5 instruments is equally valid.

Jan Zirk posted on Friday, November 30, 2012 - 8:08 am

Thank you Bengt! It seems than like a topic worth an article/short note. "It sounds like you are putting a factor behind each ordered-categorical variable" - exactly, I wanted to extract them with LRESPONSES.
One more question, to best reflect the original underlying data structure, if I run such an H1 model on all the available variables & see in its output that e.g. a few links are ns, do you think that it would be worth effort to trim such links and in the next step extract plausible values from the backwards-deleted/trimmed version of the H1 model? or rather extract them regardless the ns connections?

Bengt O. Muthen posted on Friday, November 30, 2012 - 8:31 am

My guess is that it wouldn't be worth the effort.

Jan Zirk posted on Friday, November 30, 2012 - 9:09 am

Yes, thank you; this is what I thought.
Best wishes,

K�tlin Peets posted on Friday, January 04, 2013 - 3:07 am

I use multiple imputation to handle missing data. Can I interpret model fit indices as usual? (in the analysis phase)
Thank you!

Linda K. Muthen posted on Friday, January 04, 2013 - 6:25 am

No except for ML. We report the average over the imputations for the other fit statistics which have not yet been developed for multiple imputation.

K�tlin Peets posted on Friday, January 04, 2013 - 8:54 am

Thank you. Could you please specify. Do you mean that when I use ML estimator, fit indices are interpretable or are they still averages?

Linda K. Muthen posted on Friday, January 04, 2013 - 9:51 am

With ML, the chi-square value is interpretable not the other fit statistics. The other fit statistics are averages.

Lies Missotten posted on Thursday, January 24, 2013 - 3:51 am

�Dear Linda/Bengt,
I conducted multiple imputation analyses in a (relatively small) sample of 175 person.
Next, I estimated a longitudinal path model without problems. However, when I explicitly model a correlation between the predictors, I received the following error message: �THE BASELINE CHI-SQUARE COULD NOT BE COMPUTED. THIS MAY BE DUE TO AN
INSUFFICIENT NUMBER OF IMPUTATIONS OR A LARGE AMOUNT OF MISSING DATA.�
However, if I increased the number of imputations, I still received the same error message. I do not receive this error message if I do not explicitly model the correlation between the predictors. What could be the reason for that, please?
Thank you in advance!�

Linda K. Muthen posted on Thursday, January 24, 2013 - 11:58 am

Please send the two outputs (with and without the correlation) and your license number to support@statmodel.com.

Shin, Tacksoo posted on Sunday, March 03, 2013 - 5:16 pm

Dear Linda,

I have questions related to MI with Bayesian method.

What combining rules did Mplus use especially with random effect model (latent curve model)? In the case of fixed effects, Rubin (1987) presented the method for combining results from a data analysis performed m times. Or are alternatives imposed? (e.g., jackknife variance estimator? Fractionally weighted imputation?)

Can these combination formulas be used with nonlinear models?

How about posterior predictive p-value? Is it same with "combining rules of Likelihood Ratio Test" or "Wald test", which Asparouhov and Muthen (July, 27, 2010) explained in "Chi-Square Statistics with Multiple Imputation: Version 2"?

Linda K. Muthen posted on Sunday, March 03, 2013 - 5:55 pm

Parameter estimates are averaged over the set of analyses. Standard errors are computed using the average of the squared standard errors over the set of analyses and the between analysis parameter estimate variation (Rubin, 1987; Schafer, 1997). A chi-square test of overall model fit is provided (Asparouhov & Muth�n, 2008c; Enders, 2010).

All other values are averaged over the set of analyses.

Shin, Tacksoo posted on Sunday, March 03, 2013 - 6:14 pm

Dear Linda,

Thank you for your quick reply.

You mean PPP is also averaged over the set of analyses? If so, is there any possibility of bias?

Linda K. Muthen posted on Sunday, March 03, 2013 - 8:13 pm

Yes, that is also an average. I don't believe there is any theory as to how this should be combined.

Shin, Tacksoo posted on Sunday, March 03, 2013 - 9:16 pm

Dear Linda,

Deeply appreciate your help.

Eric Deemer posted on Wednesday, March 27, 2013 - 1:25 pm

Hello,
I ran an analysis on 5 imputed data sets but get no output. I used "type = imputation" in the DATA command also. The computation window flashes on the screen and that is all. Is there something I am doing wrong?

Thanks,
Eric

Linda K. Muthen posted on Wednesday, March 27, 2013 - 2:32 pm

Please send the files and your license number to support@statmodel.com.

Maren Formazin posted on Thursday, March 28, 2013 - 8:54 am

Dear Linda & Bengt,

when using TYPE = IMPUTATION and 10 imputed datasets, there are only residual covariances. Is there an option to get residual correlations as well? If so - which command would I have to use?

Thanks for your help!

Bengt O. Muthen posted on Thursday, March 28, 2013 - 4:14 pm

There is not an option for also getting the residual correlations.

Ping Li posted on Thursday, April 04, 2013 - 1:39 pm

Hi Linda,

I use imputation data sets that I have imputed to do analysis. When I run the syntax, just as Eric Deemer above mentioned, the computation window flashes on the screen but no output file.
Could you help to see what is wrong with the syntax:
TITLE:
Public Administration;

DATA:
FILE=imputelist.dat;
TYPE=imputation;

VARIABLE:
NAMES ARE
ciserq1-ciserq8 ceserq1-ceserq5
agency gen age gender cltype clgend clpart clint
educ ethnic tenure jobpos toint car1-car4 hr1-hr11 lmx1-lmx7
tiserq1-tiserq8 teserq1-teserq5;

USEVARIABLE
ciserq1-ciserq8 ceserq1-ceserq5
gen
hr1-hr11;

ANALYSIS:
ESTIMATOR=ML;

MODEL:

ciserq BY ciserq1* ciserq2-ciserq8;
ciserq@1;

ceserq BY ceserq1* ciserq2-ceserq5;
ceserq@1;

hr BY hr1* hr2-hr11;
hr@1;

ciserq ceserq on hr gen;

OUTPUT: STANDARDIZED(stdyx);

Thanks very much!

Linda K. Muthen posted on Thursday, April 04, 2013 - 1:59 pm

Please send the files and your license number to support@statmodel.com.

Maren Formazin posted on Friday, April 12, 2013 - 4:41 am

Hi,

my dataset contains missing values. I have completed multiple imputation with m = 10 and use Mplus to estimate structural models over the 10 imputed datasets.

Additionally, I have used the original data with MISSING = BLANK to estimate the same models.

All model fit indices based on analyses with the dataset that still contains missing data are substantially better than those based on the 10 imputed datasets. I've been wondering which algorithm Mplus uses when analyzing data with missing values that explains these differences.

Thank you.

Linda K. Muthen posted on Friday, April 12, 2013 - 9:07 am

The only fit statistic that has been developed for multiple imputation is chi-square for maximum likelihood. The means are reported for all other fit statistics.

Yalcin Acikgoz posted on Sunday, April 14, 2013 - 1:25 pm

Dr. Muthen,

I am working on a multiple imputation procedure and in accordance with your Asparouhov & Muthen (2010) paper which states

"The missing data is imputed after the MCMC sequence has converged",

I am trying to run an MCMC sequence.

Even though I am using Mplus v7 (which should be able to run Bayesian statistics), I am getting

" Unrecognized setting for ESTIMATOR option:
BAYES".

Why do you think this is happening?

Linda K. Muthen posted on Sunday, April 14, 2013 - 4:15 pm

I think you are not using Version 7. Check at the top of the output where it shows which version you are using.

Yalcin Acikgoz posted on Monday, April 15, 2013 - 6:51 am

Dr. Muthen,

Thank you very much for your prompt response. I checked and you are right; it shows version 5.1. But this is weird because I can see the little Mplus7 sign on the top left corner of my window when I am using the program.And when I check the About MPlus section it says Mplus version 7. Do you think this has something to do with settings ?

Linda K. Muthen posted on Monday, April 15, 2013 - 8:38 am

You must have more than one Mplus.exe on your hard drive. So a search and delete all but the most recent.

Yalcin Acikgoz posted on Monday, April 15, 2013 - 9:37 am

Thank you Dr. Muthen, that worked.

Yalcin

Yalcin Acikgoz posted on Sunday, April 21, 2013 - 1:50 pm

Dr. Muthen,

I have one other issue that I have been coming across. Please see the line of syntax below:

missing are Q51A-Q51F (-100) Q52A-Q52E (-99 98) Q53A-Q53D (98 -99);

I am doing multiple imputation and above syntax is where I define which values need to be imputed. When I use the syntax above, for Q52 variables it does not recognize -99 as a missing value flag. However, if I change the order that I type in such that -99 comes first and 98 comes after, it does what it is supposed to be doing. Same thing happens for other variable series as well. Somehow it does not function properly when I write 98 before -99.

Do you have any idea why this might be happening?

Thanks in advance!

Linda K. Muthen posted on Sunday, April 21, 2013 - 3:09 pm

Yes, you need to put the negative number first. If it is second, the minus sign is read as a list.

Maren Formazin posted on Friday, April 26, 2013 - 5:13 am

Dear Linda,

getting back to my post from April, 12th - is it possible to get the separate results for CFI, RMSEA etc. for the different imputations?
Why would the mean for CFI, RMSEA indicate better fit in models with FIML than in models with imputed data? How does Mplus estimate parameters when there is missing data?

Thank you very much.

Bengt O. Muthen posted on Friday, April 26, 2013 - 2:19 pm

Q1. You would have to run each imputation as a separate data set.

Q2. Don't know off-hand. It's a research question; a dissertation topic someone? I would not trust average CFI or RMSEA from imputations unless it's been researched because those measures "don't know" that the analysis of each imputed data set is based on imputed data. When you do CFI/RMSEA by FIML, they "know" that data are missing because the chi-square on which they are based knows.

Q3. FIML is done the regular way of ML assuming MAR. Bayesian imputation assumes the same thing.

Yalcin Acikgoz posted on Tuesday, April 30, 2013 - 12:09 pm

Dr. Muthen,

I am imputing the data with MPlus. Examining the imputed data, I found that even though the original data is on a 1-5 scale, imputed data included values smaller than 0 and bigger than 6. Should I be concerned or can I simply use the highest (or lowest) possible value for those out-of-range cells?

Thanks!

Linda K. Muthen posted on Tuesday, April 30, 2013 - 1:10 pm

See the VALUES option of DATA IMPUTATION. I think this may be what you want.

Yalcin Acikgoz posted on Tuesday, April 30, 2013 - 1:52 pm

This was helpful, thanks!

Yalcin Acikgoz posted on Tuesday, April 30, 2013 - 9:28 pm

Dr. Muthen,

I am new to both MPlus and MI and this is why I have so many questions. There is another issue that I am struggling with. I just discovered that even though my original sample size is 3000+, the imputed data come with a sample size of 1882. I reviewed the help contents, this forum, and the user guide but I couldn't find any explanation to why this is happening. My hunch is that this might be a memory issue, but I don't know. Can you tell why this might be happening?

Thanks!

Linda K. Muthen posted on Wednesday, May 01, 2013 - 7:45 am

You are probably reading your original data incorrectly by having blanks in the data set and using free format or by having too many variable names in the NAMES list. If this does not help, send the data, output, and your license number to support@statmodel.com.

Maren Formazin posted on Tuesday, June 11, 2013 - 7:19 am

Dear Linda,

for my model, I use the following command:

TYPE = IMPUTATION;

I have 10 imputed datasets (with no missing values).
When estimating a model with four latent factors, model estimation proceeds normally. However, when trying to establish a 2nd order factor (F2 BY F11* F12 F13 F14; F2@1), I get the following message:

"The chi-square could not be computed. This may be due to an insufficient number of imputations or a large amount of missing data."

There definitely is no missing data. Why would 10 imputations not suffice? All other models worked well - as did the same model with a different dataset.

Thanks for your help!

Linda K. Muthen posted on Tuesday, June 11, 2013 - 11:42 am

You can try increasing the number of imputations. If that does not help, send the output and your license number to support@statmodel.com.

Mijke Rhemtulla posted on Wednesday, June 12, 2013 - 8:40 am

Multiple Imputation in 7.1 produces a new column of results called "rate of missing". Can you tell me what this refers to and how it's computed? I was hoping it was fraction of missing information, but the values don't match my hand calculations and I can't find it in the Guide. Thanks much!

Bengt O. Muthen posted on Wednesday, June 12, 2013 - 9:28 am

It is the same as fraction of missing information. How do you calculate it?

Mijke Rhemtulla posted on Friday, June 14, 2013 - 8:39 am

I use FMI = Vb/Vt, where Vb is between-imputation parameter variance, and Vt is total parameter variance (this definition is in Bodner 2008, Schafer 1997, Enders 2010). Does Mplus use Schafer's rate of missing information (defined here: http://sites.stat.psu.edu/~jls/mifaq.html#minf)?

Tihomir Asparouhov posted on Friday, June 14, 2013 - 9:00 am

Yes

Alvin Wee posted on Thursday, June 27, 2013 - 6:02 am

Hi, the new column "rate of missing" in version 7 is FMI right? How to we make use of it (i.e., how do we make use of them to indicate the quality of imputations)? Is there a range or cut off?

Alvin Wee posted on Thursday, June 27, 2013 - 8:32 am

Also in the output, the unstandarised estimates = the values in "rate of missing" column. What does this mean?

Linda K. Muthen posted on Thursday, June 27, 2013 - 11:19 am

Please send the output and your license number to support@statmodel.com.

Alysia Blandon posted on Friday, July 05, 2013 - 10:50 am

I am planning on running a H1 imputation model to save the imputed data sets. I have data clustered within families and was going to use the TYPE=BASIC TWOLEVEL. What I am having trouble figuring out is whether the data needs to be in wide or long format (it is currently one row per family).

Linda K. Muthen posted on Saturday, July 06, 2013 - 6:39 am

You would not need TWOLEVEL if your data are in wide format and the cluster variable is family. You would just need TYPE=BASIC. Multivariate analysis takes care of any nonindependence of observations.

Yan Liu posted on Thursday, July 11, 2013 - 6:11 pm

Dear Dr. Muthen,

I did multiple imputation and conducted a mediation analysis with 10 imputed data sets. However, I can only see 9 data sets were analyzed and the model fit indices were also not given in the output. I wonder if there is anything wrong with my Mplus code.

Here is my imputation code:
USEVARIABLES ARE Y1t1 � M1t2 ;
AUXILIARY = schid tchid stdid x1 x2 sex age;
MISSING = blank;
ANALYSIS:
type = basic;
DATA IMPUTATION:
impute = Y1t1 � M1t2 ;
ndatasets = 10;
save = imput*.dat;
OUTPUT: TECH8;

This is my mediation analysis code

USEVARIABLES ARE x1 x2 Y1t1 � M1t2 M_d Y1_d Y2_d ;

DEFINE: M_d=M1t2 �M1t2; ! Difference score of mediator between time 1 and time 2
Y1_d=Y1t2 �Y1t1; ! Difference score of Y1
Y2_d=Y2t2-Y2t1; ! Difference score of Y2

MODEL:
M_d ON x1 (a1);
M_d ON x2 (a2);
Y1_d ON M_d (b1);
Y2_d ON M_d (b2);
Y1_d ON x1 x2;
Y2_d ON x1 x2;

MODEL CONSTRAINT:
NEW(indb1 indb2 indb3 indb4);
indb1=a1*b1;
indb2=a1*b2;
indb3=a2*b1;
indb4=a2*b2;

Here is what I saw from the output of my analysis:

SUMMARY OF ANALYSIS

Average number of Observations 228
Number of replications Requested 10
Completed 9

Bengt O. Muthen posted on Friday, July 12, 2013 - 2:00 pm

Run each separately and see if one has a problem.

Yan Liu posted on Monday, July 29, 2013 - 2:25 pm

Dear Dr. Muthen,

Thanks for your suggestion! I tried analyses for each imputed data separately and found all the model data fit was really poor. Is this the reason why I cannot finish running all the imputed data sets?

The data were collected at 2 time points. The total sample size is 228. No missing at time-1, but about 16% missing at time-2. The outcome variables are the percent of time, which were derived from counts. The mediator is continuous. The independent variables are two dummy variables (2 interventions vs. control). The distributions of 2 outcome variable are bi-modal for two groups at both time points. The third group is not that obvious.

I tried two ways to model my data: (1) Using difference scores (time2-time1), the distributions of difference scores are not bi-modal though a little bit skewed. The model fit was found very bad for each imputed data! Then I took a further look at the distributions for each group. Two of them are still bi-modal.

My question is: Should I not use difference score or not use ML estimator? Now the outcome variables are overdispersion and also the bi-modal was still a problem for two of the groups.

(2) Model variables at time-2 include time-1 variables as covariates. Given the bimodal problem and missing issue, what models would you suggest me use?

Thanks!
Yan

Bengt O. Muthen posted on Tuesday, July 30, 2013 - 8:22 am

So are you saying that the multiple imputation run had only 9 of the 10 converging, but when you ran them separately all 10 converged? Which version of Mplus are you using?

Turning to your key question, poor fit can be due to using ML instead of MLR, but the bi-modality is likely to not be resolved by MLR. I would suggest investigating the cause of the bi-modality. Perhaps you want to simply use a binary variable instead of the bi-modal one?

Yan Liu posted on Wednesday, July 31, 2013 - 3:45 pm

Dear Dr. Muthen,

Thanks for your suggestion! I think using a binary variable will be easier to solve the problem.

To dichotomize the continuous outcomes, I am thinking to do it in three ways: (1) make a cut-off that separates the two modes (one small and one big distributions), (2) ask for experts' opinion, and (3) run a regression analysis with outcome, predictors, and mediator using latent class analysis (constrain to be 2 classes) and then save the membership.

Will the third option work and be better?
Oh, I used Mplus 6.11. So is any difference between the versions?
Best regards,
Yan

Bengt O. Muthen posted on Thursday, August 01, 2013 - 8:30 am

The choice between 1-2-3 has to be made by the researcher.

I would recommend always using the latest version of Mplus, which currently is 7.11.

Yan Liu posted on Friday, August 02, 2013 - 7:51 am

Dear Dr. Muthen,

Thanks a lot! Should I dichotomize outcome variables first and then impute missing data or the other way around?

One more question. When imputing continuous outcome variables (should be zero or positive), I found that some imputed values are negative. Is there a way to constrain the imputed value not to be negative?

Best regards,
Yan

Bengt O. Muthen posted on Friday, August 02, 2013 - 10:40 am

Personally, I would use max info for imputation so not dichotomize first, but this is your choice.

See the VALUE option on page 518 in the Version 7 UG.

Lauren Mitchell posted on Tuesday, August 20, 2013 - 1:14 pm

Hello! I am trying to run latent growth curve models with longitudinal data. I have 10-15 waves of data, but unfortunately roughly 30% of participants are missing on my predictor. From my understanding, MI is the best method for handling the missing data on these x-values - does that sound right? I was able to create the imputed data sets, but was not able to take the next step and run the model using the imputed data. The MS-DOS window appeared briefly and disappeared, and no output was produced. If you have any advice, I'd really appreciate it!

Thanks,
Lauren

Linda K. Muthen posted on Tuesday, August 20, 2013 - 1:25 pm

Please send the input and data sets to support@statmodel.com. If you are not using Version 7.11, try that first.

Ragnhild S�rensen H�if�dt posted on Thursday, September 26, 2013 - 4:53 am

Hello,
I'm new to mplus, and I'm doing a growth mixture model. As I have some missing data in the covariates I wanted to do multiple imputation (Type=Imputation). I see from older posts that I should use starting values to avoid label switching. Does this still apply or do the program automatically use the estimates from the first data set as starting values for the subsequent data sets? I'm using version 7.11.

Thanks,

Ragnhild

Linda K. Muthen posted on Thursday, September 26, 2013 - 12:14 pm

Instead of multiple imputation I would include the covariates in the model by mentioning their variances in the overall MODEL command and use FIML. In this way distributional assumptions are made about them but cases with missing on one or more covariates are not excluded from the analysis.

Ragnhild S�rensen H�if�dt posted on Friday, September 27, 2013 - 3:38 am

Thanks for the advice! I will try that.

Michael T Weaver posted on Thursday, October 03, 2013 - 5:59 am

Linda & Bengt:

I appreciate the guidance you've provided for my journey into imputation.

In trying to understand what is going on, and the tests involved, I analyzed a data set of about 300 observations (all continuous scales) in a couple of ways:
(1) FIML using ML, MLR, and Bayes estimators.
(2) Bayes estimation of factor scores, creating 50 imputed data sets, then ML estimation of the latent model.

The FIML estimates all showed poor model fit (no surprise). The ML chi squared results from the imputed data, however, showed good fit (using the chi squared test in the output). That surprised me, given the poor fit using FIML - I was expecting consistent (though not exactly the same) results.

Is the imputed-produced ML test testing something different, or am I missing an additional step? (I read the Multiple Imputation technical paper, version 2, 07/27/2010 - I assume that the output chi squared is the appropriate test of fit.)

I don't want to publish an incorrect interpretation - appreciate guidance to help me avoid that!

Thanks!

Michael

Linda K. Muthen posted on Thursday, October 03, 2013 - 11:53 am

I believe you are looking at an average chi-square value and a standard error in the multiple imputation output. Is this the case? If so, this is not the true imputation chi-square.

Michael T Weaver posted on Monday, October 07, 2013 - 2:23 pm

Linda:

If that is what is provided in the output, then yes.

There is a note about average over 50 data sets, but that appears after SAMPLE STATISTICS heading, so I thought it referred only to those.

Here is excerpt from my MPLUS Imputation output:

MODEL FIT INFORMATION

Number of Free Parameters 37

Loglikelihood

H0 Value -619.847
H1 Value -391.050

Chi-Square Test of Model Fit
Value 14.280
Degrees of Freedom 23
P-Value 0.9186

Chi-Square Test of Model Fit for the Baseline Model
Value 168.670
Degrees of Freedom 44
P-Value 0.0000

SRMR (Standardized Root Mean Square Residual)
Value 0.063

I had assumed the ML estimator Chi-squared results produced would reflect the information in the Technical Appendix "Chi-Square Statistics With Multiple Imputation" Version 2.

Do I need to "hand calculate" the appropriate chi squared statistic using these averages?

Thanks.

Michael

Linda K. Muthen posted on Tuesday, October 08, 2013 - 9:50 am

The values above are not averages. They are the values described in the Technical Appendix "Chi-Square Statistics With Multiple Imputation". They are available only for ML not, for example, MLR. Please send the two outputs, imputation and FIML, along with your license number to support@statmodel.com.

Deborah Bandalos posted on Wednesday, October 09, 2013 - 8:03 am

My question is about the autocorrelation plots obtained using multiple imputation. I'm not sure what is being plotted on the horizontal axis. No matter how many iterations I specify, the axis runs from 1-30. Are iterations "binned" somehow to create the horizontal axis? If so, is the binning achieved by just dividing the total number of iterations by 30? I tried to change the axis range, but Mplus shut down, so I'm guessing that's not an option.

Thanks,

Debbi

Linda K. Muthen posted on Wednesday, October 09, 2013 - 10:20 am

Please send the output, graph file, and your license number to support@statmodel.com.

Maren Schulze posted on Monday, November 18, 2013 - 2:35 am

With my data, I have computed the same SEM twice:

- once with some missing data, using ML
- once with 10 imputed datasets

The chi-square value for the MI-datasets is smaller than the value for the dataset with some missing information (with the same df and N); however, CFI for the MI-dataset is lower than the one for the dataset with some missing information.

Why would this happen?

Thanks for your help!

Maren Schulze posted on Monday, November 18, 2013 - 4:01 am

In addition to my previous post:

For simulation purposes, I have also used mean imputation and single stochastic regression imputation. Both methods have led to very similar results as ML with missing data.

So it is only the analyses with 10 imputed datasets where chi-square is much lower (around 1275 compared to around 1450 for the three other options - with df= 98 and N = 2326) and CFI is lower, too (.910 compared to .916).

Thanks.

Bengt O. Muthen posted on Monday, November 18, 2013 - 8:37 am

For your MI approach, are you referring to the chi-squares for each imputed data set, or the one chi-square that summarizes all the imputed data sets? I am referring to techniques discussed on slides 212- of the 6/1/11 Topic 9 handout.

Maren Schulze posted on Tuesday, November 19, 2013 - 4:39 am

In my output file, there is only one chi-square value which I presume is the one that summarizes all imputed data sets.

(I'm using Mplus 7 with "TYPE = IMPUTATION;" and ten imputed datasets; according to the output, all ten requested replications are completed.)

Bengt O. Muthen posted on Tuesday, November 19, 2013 - 8:41 am

Please send the output for the 2 runs that you compare to Support.

Maren Schulze posted on Wednesday, November 20, 2013 - 8:13 am

Thanks for your suggestion, I've done so this morning.

An additional question: I have compared the chi square values of the baseline model between multiple imputation and ML (with missing data) - they differ quite a lot; whereas the differences between chi square baseline model ML (with missing data), mean imputation and single stochastic regression imputation are comparably small. Why would that happen?

Bengt O. Muthen posted on Wednesday, November 20, 2013 - 6:33 pm

Please send those 2 baseline outputs and data to Support.

Anonymous posted on Monday, January 06, 2014 - 2:24 am

Hello!
I have calculated CFA with categorical variables with 5 MI datasets (TYPE = IMPUTATION). I ran two CFAs with different models. In order to compare them, can I simply calculate the Chi-square difference test by subtracting the Chi-square values provided in the MODEL FIT part of the outputs?

Thanks in advance!

Linda K. Muthen posted on Monday, January 06, 2014 - 7:12 am

The chi-square with multiple imputation cannot be used for difference testing. You should use FIML if this is important to your study.

Lucy Morgan posted on Thursday, January 23, 2014 - 1:23 am

Hi

I am trying to run a fully latent path model (N = 199) with a dataset that is complex (data collected from care assistants, clustered by nursing homes), non-normal distribution, and missing data (< 5%). Data is missing on both exogenous and Endogenous variables. I understand that FIML can account for missing data on endogenous variables only, thus number of observations are reduced to 167 when I run the model. I have a couple of questions I would be very grateful if you could answer:

1) Should I use multiple imputation to compute missing data for ALL variables, and then run the model based on the imputed datasets? Or should I only impute data for the exogenous variables and then run the model with imputed datasets AND FIML? (I did try to impute only exogenous variables, but missing data on the endogenous variables was replaced with * and the model would not run....)

2) When running the multiple imputation, would it be ok to run a straightforward imputation (TYPE = BASIC) as per ex 11.5 in the Version 6.0 handbook or should I be running the multiple imputation with the model that reflects my dataset (TYPE = COMPLEX) similar to ex 11.7?

3) When I run a TYPE = COMPLEX model, I cannot also use MLM (which I need to use to account for non-normal data). Can I simply substitute MLR?

Many (many!) thanks
Lucy

Linda K. Muthen posted on Thursday, January 23, 2014 - 10:28 am

In your case, I would not use multiple imputation. I would use COMPLEX and MLR and include the variances of all observed exogenous covariates in the MODEL command. They will be treated as dependent variables and distributional assumptions will be made about them. Missing data theory will then be used for them. This is asymptotically the same as doing multiple imputation.

Jason Edgerton posted on Monday, February 24, 2014 - 11:16 am

Hello,
I have missing data for a 4 wave LGC model I am running, I can't use MLR as an estimator because 2 of my variables are non-normal, I can't use MLM because I have missing values. So I chose to impute 10 data sets (in Mplus 7) and then use MLM estimation --I understand the chi-square and fit statistics are just averages and not accurate evaluations of fit (unless you're using ML), but does MLM still produce robust SEs and parameter estimates with multiple imputed data sets? That is, is it appropriate to use MLM estimation with multiply imputed data if I'm not planning on comparing nested models)?

Linda K. Muthen posted on Tuesday, February 25, 2014 - 6:33 am

Why can't you use MLR? MLR is robust to non-normality of continuous variables. What do you mean by non-normal?

Jason Edgerton posted on Tuesday, February 25, 2014 - 1:45 pm

Sorry my mistake re: MLR (by non-normal I mean one continuous predictor and the continuous outcome variable are both highly leptokurtic). When I previously estimated the unconditional LGC model with MLM on imputed data, the fit indices RMSEA, CFI and SRMR all indicated adequate fit, but with MLR on the original data (with missing) these indices all indicate inadequate fit (the parameter estimates are quite similar) -- I assume I have to put more stock in the MLR estimated fit stats and conclude that my model has poor fit --correct?

Thanks

Linda K. Muthen posted on Tuesday, February 25, 2014 - 2:53 pm

You can't assess fit in multiple imputation using the means of the fit statistics. These are means. How well or poorly they represent fit has not been studied. So yes, it seems your model does not fit.

Patricia Schultz posted on Monday, March 03, 2014 - 6:09 am

Hi Dr. Muthen,

I'm using multiple imputation (Amelia) and running 10 computations. How do I pool the model fit indices (CMIN/df), CFI, RMSEA, SRMR)? In an earlier post (from 2006) it says to just calculate a simple average of the values (as there is no specific theory on this), I was wondering if this has changed or if that is still the practice.

Thank you.

Linda K. Muthen posted on Monday, March 03, 2014 - 10:17 am

This is still a research question.

Shin, Tacksoo posted on Thursday, March 06, 2014 - 12:18 am

Dear Linda,

I have a question about "the concept of multiple imputation using Bayesian estimation".

When imputations are created under Bayesian arguments, MI has a natural interpretation as an approximate Bayesian inference. In addition, I thought that this missing data technique uses Byesian estimation method when obtaining parameter estimates. So, I wrote the below syntax,
--------------------------------------
ANALYSIS:
ESTIMATOR = BAYES;
MODEL:
i s | Y1@0 Y2@1 Y3@2 Y4@3;
[i](a); [s](b); i(c); s(d);i with s(e);
MODEL PRIORS:
a ~ N(190, 20);b ~ N(7, 3);
c ~ IW(616, 5);d ~ IW(8, 5);
e ~ IW(-28, 5);
DATA IMPUTATION:
IMPUTE = Y1-Y4;NDATASETS = 15;
SAVE = C:\*.DAT;
---------------------------------------

By the way, some are confused whether "MI using Bayesian" indicates Bayesian estimation or simply means MI proposed by Rubin. I thought it is the former. Am I correct?

Thank you

Bengt O. Muthen posted on Friday, March 07, 2014 - 1:57 pm

See page 516 of our UG to see an overview of how imputation works in Mplus. See also the imputation examples in Chapter 11. You can do "H1 imputation" or "H0 imputation". Your setup is an H0 imputation example in line with UG ex 11.7.

Rubin proposed MI using Bayes, so these are one and the same.

I recommend the Craig Ender missing data book.

Shin, Tacksoo posted on Friday, March 07, 2014 - 4:48 pm

Dear Bengt,

Thank you for your information.

You explained "the data can be imputed from any other model that can be estimated in Mplus with the Bayesian estimator (H0)". Then, the imputed data sets are used in the estimation using Bayesian(Bayesian estimation is used to estimate the model). Does it mean that parameter estimates from Bayesian posterior distributions (for each imputed data) were obtained and combined all? Am I correct?

If so, setting informative priors (model priors) affect imputation step? or estimation process? or both?

Bengt O. Muthen posted on Friday, March 07, 2014 - 4:54 pm

Q1. Multiple draws were generated from the Bayesian posterior distribution of H0 model estimated parameters and for each draw data were generated.

Q2. The priors only influence the first H0 model estimation step.

Shin, Tacksoo posted on Friday, March 07, 2014 - 5:50 pm

Dear Bengt,

Deeply appreciate your quick reply.

Here is one last question.

When estimating parameter (in estimation step), Mplus uses simply noninformative priors?

Thank you.

Linda K. Muthen posted on Monday, March 10, 2014 - 8:33 am

Yes.

Lindsay Bell posted on Tuesday, March 18, 2014 - 10:34 am

Hello -

I am doing multiple imputation with a Bayesian estimator. I have a few parameters that appear to have autocorrelations still present at 30 lags. How can I see the autocorrelations for lags greater than 30? Both the output and the plot only go through 30.

Also, how can I get the fraction of missing information for each parameter?

Thank you,
Lindsay

Tihomir Asparouhov posted on Tuesday, March 18, 2014 - 2:39 pm

Using the thin option of the analysis command can give you bigger lags, if you use thin=10 the auto correlations that you will see are essentially 10, 20, 30, ..., 300 so it is multiplied by 10. Alternatively use the BPARAMETERS option of the savedata command to get all parameter values and compute the desired autocorrelation in excel.

The fraction of missing information for each parameter is obtained after the imputations are done as in example 13.13 where the desired model is specified.

Suzan Doornwaard posted on Tuesday, April 08, 2014 - 4:15 am

Hello,

I am working with a multiple imputed dataset (5 imputed sets) because we employed a 3-form planned missingness design in a large questionnaire. My most important variables are non-normally distributed, so I usually use the MLR estimator for my models.

As I conclude from reading all the information here, the fit statistics (chi-square, RMSEA, CFI) for MLR are
1) all averages over the computed sets
2) these averages are not reliable because Mplus does not "realize" they are from imputed data
3) for the same reason the fit statistics of the separate imputed sets are not reliable either

Now some questions that arise are:
-How do I assess the fit of my model? Should I run it with ML and see if fit statistics here are similar to the averages I get with MLR? That is, as an indication - I don't think actually reporting results of ML models is a good idea since the non-normal distribution of my variables.
-What do I report in a manuscript when I want to refer to model fit (reviewers ask for it)?
-Is there ANY way to test nested models using MLR (constrained vs. unconstrained to test for moderation by gender)? Or any other way to test moderation in this case?

Thank you,
Suzan

Linda K. Muthen posted on Tuesday, April 08, 2014 - 9:57 am

If you have planned missingness, use FIML not multiple imputation. Then you have fit statistics and can test nested models.

Suzan Doornwaard posted on Tuesday, April 08, 2014 - 10:21 am

Dear Linda,

Thank you for your reply. Unfortunately we do have to work with the imputed sets.

Could you tell me how I should report on the model fit using TYPE=IMPUTATION in a manuscript? Are there other ways to assess the fit?

And can I use the Wald-test instead for moderation purposes?

Thank you,
Suzan

Linda K. Muthen posted on Tuesday, April 08, 2014 - 11:22 am

You can use MODEL TEST with multiple imputation. No difference testing can be done. The only absolute fit statistic is chi-square for maximum likelihood for continuous outcomes. See the following paper on the website under Bayesian Analysis:

Asparouhov, T. & Muth�n, B. (2010). Bayesian analysis of latent variable models using Mplus. Technical Report. Version 4. Click here to view Mplus inputs, data, and outputs used in this paper.

As far as what others report, I don't know. You might want to ask that on general discussion forum like SEMNET.

Linda K. Muthen posted on Tuesday, April 08, 2014 - 11:39 am

I'm sorry. This is the paper I meant:

Asparouhov, T. & Muth�n, B. (2010). Multiple imputation with Mplus. Technical Report. Version 2.

Masha Pavlovic posted on Wednesday, April 09, 2014 - 5:56 am

Hello,

I am trying to do imputation of missing data before running ULSMV analysis.

I'm getting an error message
"PROBLEM INVOLVING VARIABLES AND xC� .
REMOVING ONE OF THESE VARIABLES FROM THE IMPUTATION RUN CAN RESOLVE THE PROBLEM."

The problem is that the strange characters are appearing instead of the names of variables and I can not figure out which variables should I remove.

Tnx in advance for the help!

Linda K. Muthen posted on Wednesday, April 09, 2014 - 10:58 am

Please send the input, data, output, and your license number to support@statmodel.com.

Joop Hox posted on Thursday, May 15, 2014 - 2:12 am

Hi all, I have a practical question: is there a maximum limit to the number of imputed datasets that Mplus can handle?

-Joop

Linda K. Muthen posted on Thursday, May 15, 2014 - 7:06 am

Not that I know of. If you have had a problem, please send it to support.

Shiny7 posted on Tuesday, September 30, 2014 - 11:07 am

Dear Mrs. Muthen,

may I ask another question, please?

Is the 'Analyze multiple imputation datasets' compatible with Multilevel Modeling?

I tried it, the analysis runs well but the Regression Coefficients and SE�s are not plausbile; Furthermore Mplus registerd 22 Clusters, although in fact there are only 21.

I hope you can give me little support.

Thanks a lot!

Shiny

Linda K. Muthen posted on Tuesday, September 30, 2014 - 11:39 am

Yes, multiple imputation can be done with multilevel modeling. It sounds like you are not reading your data correctly. If you can't see the problem, send the data and output to support@statmodel.com.

Shiny7 posted on Tuesday, September 30, 2014 - 11:47 am

okay, thank u so much, I am going to check my model again...

Have a nice day...

Linda K. Muthen posted on Tuesday, September 30, 2014 - 11:54 am

It's the data you should check. You may have blanks in it that cause it to be misread in free format. Blanks are not allowed in free format data.

Shiny7 posted on Wednesday, October 01, 2014 - 12:31 am

Dear Mrs. Muthen,

thanks a lot, it was only the term
'Imputation_' that had been missing at the beginning of the 'names command'....

Now it works fine...

Wen-Hsu Lin posted on Sunday, October 19, 2014 - 12:26 am

hello Mrs. Muthen,

I try to use MI in mplus; however, I do not know how to use the value syntax.
my syntax as follow:
variable: names are
income sex
w3dep;
usevariables are
income sex
w3dep;

missing is blank;
data imputation:
impute = number
income
sex(c)
w3dep;
So, how do I tell mplus my w3dep ranged from 1 to 16? Thank you.

Linda K. Muthen posted on Sunday, October 19, 2014 - 9:12 am

VALUES = w3dep (1-16);

Lois Downey posted on Tuesday, October 21, 2014 - 10:57 am

I used DATA IMPUTATION in Mplus to generate 5 datasets with values imputed for all missing data in my dataset. I am now using TYPE = IMPUTATION to analyze the data.

Some of the outcome variables in the dataset are censored from below. When I define one of these outcomes as censored, Mplus does not provide any warning that this statement will be ignored. However, the results of the analysis match the results of an analysis in which I omit the CENSORED command. Does this mean that Mplus ignores the statement and performs the analysis as if the outcome is uncensored continuous?

Bengt O. Muthen posted on Tuesday, October 21, 2014 - 12:20 pm

Please send the output and license number to Support so we can diagnose this.

Wen-Hsu Lin posted on Tuesday, October 21, 2014 - 5:42 pm

Dear Mrs. Muthen

I have drop outs(missing) in my longitudinal data. I ran a LGM(wave 1 to wave 3) and used a wave 4 latent variable as a distal outcome (deviance).

I ran a multiple imputation and the results were different from that of default in handling missing data in Mplus. Specifically, for the multiple imputation model: deviance on i s was not significant. On the other hand, when I do not use multiple imputation just denote that missing is blank. I got significant for the same path.

Which one of these is more trustworthy?
Thank you so much.

Matthew Diemer posted on Wednesday, November 26, 2014 - 1:21 pm

I'm considering multiple imputation for a dataset I'm working with (all categorical variables & some dichotomous & so am using WLSMV estimator). The dataset has a weight variable but no clustering or stratification.

My hesitation with using multiple imputation comes from this thread, which suggests that the resulting fit indices (e.g., RMSEA, TLI, etc.) are averages & are therefore not interpretable per Linda Muthen. My understanding is that only chi-square is interpretable yet I am concerned about using that with a fairly large sample size.

Is it still the case that interpreting other fit indices is still an issue being researched (no clear answer yet?)

I suppose the alternative would be EM imputation in one dataset, yet I know of some concerns in the literature with doing so with dichotomous variables.

Thank you.

Linda K. Muthen posted on Wednesday, November 26, 2014 - 2:43 pm

This is still the case except for maximum likelihood with continuous outcomes.

Ashley posted on Tuesday, December 02, 2014 - 4:34 pm

Is it possible to compute basic descriptives across imputed datasets (means, SD, correlations, etc)?

Also, I've conducted a CFA and I would like to compute reliabilities (alpha) of the factors identified. Is this possible to compute across all imputed datasets?

Thank you in advance.

Bengt O. Muthen posted on Tuesday, December 02, 2014 - 6:04 pm

Q1. Try Type=Basic.

Q2. You can do this using SEM in line with articles/books by Raykov and Marcoulides.

Ashley posted on Tuesday, December 16, 2014 - 12:07 pm

Hello,

I attempted to obtain the descriptives of the imputed datasets using type = basic. While I can get the means and correlations, I have been unable to get the standard deviations. Is there another code that I could use?

Also, I've been unable to figure out how to flag significant correlations. Is this possible for MI data?

Thank you in advance.

Linda K. Muthen posted on Tuesday, December 16, 2014 - 6:34 pm

We give sample statistics for the first imputed data set. For continuous variables you should get means, variances, and covariances. For categorical variables, you should get thresholds and correlations.

Aurelie Lange posted on Monday, January 05, 2015 - 4:24 am

Dear Dr Muthen and Muthen,

As discussed above, when using multiple imputation the model output includes a column Rate of Missing. I understand this represents the (un)certainty of the model results due to the missing data. Yet, I have been unable to find any information regarding what is an acceptable rate of missing information. Could you provide me with a reference on this topic?

Thank you in advance and have a lovely new year!

Aurelie

Linda K. Muthen posted on Monday, January 05, 2015 - 8:36 am

See the FAQ Missingness Fraction on the website.

Michelle Colder Carras posted on Wednesday, February 04, 2015 - 4:21 pm

Hello,

I'm trying to analyze some datasets (impmissl) created through DATA IMPUTATION on the original dataset (Aim2). When I try to use these datasets to run my model using

DATA: FILE IS impmissllist.dat;
TYPE = IMPUTATION;),

I find that there are illegal characters - asterisks. I figured out that these are probably missing values in the original dataset, Aim2.dta, that are represented as -9999 in that Aim2.dat. For some reason, though, for the variables that I did not impute (to save time/power), it seems the missing values may have been converted back to their original asterisks (the file, Aim2.dat, was the result of the stata2mplus utility where asterisks representing missing data were replaced by -9999).

I've tried listing the * as a missing value in the MISSING statement, e.g.
Missing are all (*) or Missing are vat1 (*) dx1(*),but that doesn't seem to work either as I get errors, e.g.:
"ERROR in VARIABLE command
Period (.) or asterisk (*) used as the missing symbol must apply to the whole
dataset. No variables (or ALL) should be mentioned in the MISSING option..."

How can I either replace the * again or keep them from being made into * in the first place. Or maybe something else entirely is happening?
Thanks,

Michelle

Linda K. Muthen posted on Thursday, February 05, 2015 - 5:08 am

Mplus uses asterisks as the missing data flag in data sets it saves. Say MISSING = *; Don't use variables names or ALL in the MISSING statement. You should look at the output where you created the imputed data sets. All of the information about the data sets including the order of the variables is shown at the end of the output.

Fabio Giudici posted on Monday, February 09, 2015 - 9:21 am

Dear all,

I have run a multiple imputation with Stata but I would like to run a latent class analysis with Mplus. For this reason, I would like to know if there is an option/command in Mplus for handling my new dataset that include 20 datasets created by the multiple imputation.
I don't want to do all the analysis (MI+LCA) with Stata because for my main analysis, with another dataset, I found the classes with Mplus. So I'd like to use the same program also for this dataset that I'm using to confirm my results.

Thank you!
Fabio

Linda K. Muthen posted on Monday, February 09, 2015 - 9:29 am

See Example 13.13

Fabio Giudici posted on Tuesday, February 24, 2015 - 5:52 am

Dear Linda,
thank you for your quick answer. I guess you meant Example 12.13 since 13.13 does not exist on the book. Anyway, I run the analysis on a small dataset (with 5 imputed dataset instead of 20) to check if the code was correct but something is wrong. I have 2 errors: "Errors for replication with data file" and "ERROR in DATA command. An error occurred while opening the file specified for the FILE option."

The following is the code I used.

TITLE: LCA with multiple imputation
DATA: file is O:\LCA\Mplus\prova.txt;
type = imputation;
VARIABLE: names = nat sex drug age id data;
usevariables = nat sex drug age;
categorical = age;
nominal = nat sex drug;
idvariable = id;
classes = c(2);
ANALYSIS: Type Mixture;
starts = 1000 100;
OUTPUT: standardized;
tech1 tech7;
SAVEDATA: file = indout_LCA_mi.txt;
results = res_LCA_mi.txt;
save = cprobabilities;

The variable data is numeric and 1 represent the first imputed dataset, 2 the second and so on.

Thank you very much for your help!
Fabio

Linda K. Muthen posted on Tuesday, February 24, 2015 - 7:35 am

In the current user's guide, the example is 13.13.

Please send the files and your license number to support@statmodel.com.

Fabio Giudici posted on Wednesday, February 25, 2015 - 4:38 am

Yes, I found the current user's guide and the example I have looked at is the same.
I can't send the data set but I can give you an example:

nat sex drug age id data
1 1 1 1 1 1
1 1 0 1 2 1
0 1 0 3 3 1

and so on until the end of the first imputed data set. Then there is the second imputed data set:

0 0 0 2 1 2
1 1 0 3 2 2
1 0 1 1 3 2

I did that for all the imputed data sets, so my file "prova" has one data set under the others with the variable data indicating the number of data set. Is this the proper way to set the file? May I run a mixture analysis with MI data sets?

Thank you for your comprehension.
Fabio

Linda K. Muthen posted on Wednesday, February 25, 2015 - 5:55 am

Each imputed data set must be in a separate file. The file named in the FILE option contains the names of the data sets. This is explained in the example.

Fabio Giudici posted on Wednesday, February 25, 2015 - 6:36 am

Ok, sorry! Now it is clear and everything worked.

Thank you again for your help.

Best
Fabio

Andrea Norcini Pala posted on Monday, March 02, 2015 - 10:27 am

Dear Professors Muth�n,

I am performing multiple imputation on a large sample of data.
I have two types of missing data: by design (99) and not by design (999).
I want to impute the 999 (missing not by design) only but I can't fine the right sytanx. Could you help me with this?
Thank you,
Andrea

Bengt O. Muthen posted on Monday, March 02, 2015 - 4:29 pm

Just use 999 as your missing data designation in the imputation run. The missing by design could later be handled by multiple-group analysis for instance.

Andrea Norcini Pala posted on Tuesday, March 03, 2015 - 11:54 am

Thank you!

Jen posted on Thursday, March 05, 2015 - 12:50 pm

Hello,

I had a question related to the above about missing data that is a combination of MCAR and MAR(ish).

I am constructing a relatively complex structural model with 8 multi-indicator latent constructs plus 3 manifest variables, one of which is categorical. The categorical manifest variable is one of five mediators and correlates with other mediators (so including covariances seems necessary). Additionally, many of the indicators for the latent variables are categorical.

Two latent variables (all indicators) and the categorical manifest variable are MCAR for half the sample (due to giving participants a random subset of measures). There is a small amount of data missing for other reasons.

Because of the categorical variables and need for covariances, I'd like to use WLSMV but am concerned about the missing data handling. I thought I might use MI, but wonder if I am okay to just impute the small amount of MAR(ish) data but not the MCAR data. Imputing the MCAR data seems to be causing issues (everyone in that half of the sample is being assigned identical values).

I am also open to other ideas for this situation.

Thank you!

Bengt O. Muthen posted on Thursday, March 05, 2015 - 1:39 pm

How about handling the MCAR by multiple-group analysis in WLSMV? I assume the MCAR patterns of non-missing variables can be used to form separate groups of subjects. And then use MI for within-group MAR missing.

Jen posted on Thursday, March 05, 2015 - 2:31 pm

Would the structural model for the group with missing variables just exclude those variables, then? And I would test whether various parameters differed across groups and hope not such that they could be constrained? The random groups are of course of no substantial interest. One very key variable theoretically is actually missing for 50% of the sample, but I am hoping to include that half of the sample to increase the precision of estimates of other paths in the model given our interest in indirect effects.

Thank you for the suggestion.

Bengt O. Muthen posted on Friday, March 06, 2015 - 7:59 am

Right, the MCAR missing variables in a group would be excluded.

Lisa M. Yarnell posted on Friday, March 20, 2015 - 1:00 pm

Hello,

What are my options for saving data from a two-level analysis constructed using 20 imputation values? I received the message: "The SAVE option is not available for TYPE=MONTECARLO or TYPE=IMPUTATION."

Is there another way to specifically save the level-2 score (cluster mean) I created from the 20 input data sets using the following select code? "readcomp" is the variable I am reading in that has 20 plausible values (1 in each of the data sets analyzed and combined here). So I already have the 20 plausible values/imputations. I just want to get the level-2 average of them, and save that calculated variables for further analysis. Thank you.

DATA: FILE IS "R:\proj\031815list.txt";
TYPE = imputation;
VARIABLE: NAMES ARE teachID std_blck readcomp;

USEVARIABLES ARE readcomp B_BYSCHL;
USEOBSERVATIONS ARE std_blck eq 1;
BETWEEN IS B_BYSCHL;
WITHIN IS readcomp;
CLUSTER IS teachID;
MISSING IS .;

DEFINE: B_BYSCHL = CLUSTER_MEAN (readcomp);

ANALYSIS:
TYPE = TWOLEVEL;

MODEL:
%WITHIN%
readcomp;

%BETWEEN%
[B_BYSCHL]; B_BYSCHL;

SAVEDATA: file is R:\proj\031915.dat;

OUTPUT: Sampstat STDYX TECH1 SAMP res;

Linda K. Muthen posted on Friday, March 20, 2015 - 3:40 pm

This would have to be done using a batch file on an Mplus input that has a single data set, one for each imputed data set (ie. not using TYPE=IMPUTATION). You can modify the RUNALL utility available on our website at the page

http://statmodel.com/runutil.shtml

If you run into problems setting this up, please email your files to support@statmodel.com.

Tom Booth posted on Sunday, March 22, 2015 - 10:19 am

Dear Linda/Bengt,

I have a potentially very simple question I am just struggling to find an answer to.

I wish to use multiple imputation, and then fit a model on those data sets, ideally in a single script. However, I want to use more variables for the imputations than I use in the model.

When the extended list of variables (those I wish to use for imputation) is added, and the model syntax only uses a subset of the variables, fit is obviously poor as there is a large set of uncorrelated variables.

Is there a way round this, or is this a case of needing to do the analysis in two stages?

Best

Tom

Bengt O. Muthen posted on Monday, March 23, 2015 - 2:43 pm

You have to do it in two steps.

Mike Zyphur posted on Wednesday, April 15, 2015 - 4:25 am

Hi Linda and Bengt,
Some datasets have imputed values as separate variables rather than separate datasets (e.g., instead of 20 datasets, there is a single dataset wherein each variable with missing data is repeated 20 times). When this exists, is there any way to run Mplus so that instead of the "Data: Type = imputation" command, it is possible to indicate which range of variables have the imputed values for each variable?

This is a shot in the dark, I realize, but would be a big help for an ongoing project. Thanks for your time and help!

Mike Zyphur

Linda K. Muthen posted on Wednesday, April 15, 2015 - 6:23 am

Mo, Mplus requires separate data sets.

dennis posted on Friday, April 17, 2015 - 3:19 pm

To follow protocol for reporting results, I am putting together a table of correlation/covariances, means, and standard deviations. I used multiple imputation with ULSMV. I am having a difficult time finding the means and standard deviation in the output. To report my means, do I use the number under �Means/Intercepts/Thresholds?� How do I get the standard deviations to report?

Linda K. Muthen posted on Friday, April 17, 2015 - 4:52 pm

For continuous variables, these values would be means. Standard deviations are the square root of the variances that are found on the diagonal of the covariance matrix.

dennis posted on Friday, April 17, 2015 - 9:34 pm

Ok, thanks! So, for ordinal variables, these values would be thresholds? Does it make sense to report thresholds for these variables as I would the means and sd for their continuous counterpart?

Linda K. Muthen posted on Saturday, April 18, 2015 - 11:08 am

You can do this.

Pamela Medina posted on Monday, April 20, 2015 - 10:51 am

Hello, I am trying to conduct a CFA model with categorical and dichotomous variables using WLSMV. My dataset had a large number of missing variables so I used multiple imputation. I have three questions:

1. Based on this thread it appears that fit indices RMSEA, LTI, etc. are not calculated for imputed datasets. Is this still the case? If so, how is model fit assessed for this type of data?

2. My original dataset had a sample size of 3,000 but after the imputation was run the analysis reads that there are only 1,500 observations. Is there something I missed about the imputation process?

3. Is there a way to get frequency tables for the imputed data, OR get a final dataset with the final imputed values so that they can be transferred into another program?

Linda K. Muthen posted on Monday, April 20, 2015 - 11:21 am

1. Yes. It cannot be assessed.

2. It sounds like you have more variable names and columns in the data set causing two records to be needed for each observation.

3. You can saved the imputed data set. See the SAVE option of DATA IMPUTATION.

Pamela Medina posted on Monday, April 20, 2015 - 2:03 pm

Thank you Dr. Muthen.

I re-checked my variable names and columns and adjusted them. However, now I am still only getting 2785 observations. I also entered the following command in order to only get integer values and am still getting 3 decimal values in my output files.

Rounding = cp2 cp4a cp4 np1 np2
cp5 cp6 cp7 cp8 cp9 cp13 cp21 it1
prot3 prot6 prot8 b10a b11 b13 b18 b21
b21a b31 b32 edcat q11 etid Vote inccat(0);

Lastly, it looks like the values associated with my categorical data points have been changed in the imputation process. Categories that were labeled 1 are now labeled 0, 2 is now 1, etc. Is there any way that I can keep my original values?

Thank you!

Linda K. Muthen posted on Monday, April 20, 2015 - 4:23 pm

Please send the relevant files and your license number to support@statmodel.com.

CB posted on Tuesday, April 28, 2015 - 8:29 am

Hello,

I'm running multiple imputation as part of LCA. I have specified the variables that I want included in the model in the USEVARIABLES command. However, I have variables that I don't want in the LCA but want to be used for the imputation; what code is needed to incorporate these variables? Thanks!

Bengt O. Muthen posted on Tuesday, April 28, 2015 - 11:09 am

Do the imputation analysis in a first, separate step before the LCA.

Lisa M. Yarnell posted on Wednesday, April 29, 2015 - 11:39 am

Hello, I have three quetions about my current analyses using TYPE=IMUPTATION for combining over 20 data sets, each with 1 plausible value estimate of student performance. I am running a TYPE=TWOLEVEL COMPLEX RANDOM model.

1) The 20 read-in data sets have complete data (no missingness), and 165,411 cases. But when I run my analyses in Mplus, the number of cases is 145,678. Why is that happening?

2) What does it mean that only 18 of my 20 requested replications are being completed successfully? Why would not all replications complete?

3) Several of my dichotomous variables are x variables on Level 1 and y variables on Level 2; as such, Mplus is treating them as y variables on both levels (a common warning message that I have seen before). This is OK with me, but should I therefore specify an WLSMV estimator rather than MLR?

Thank you.

Lisa M. Yarnell posted on Wednesday, April 29, 2015 - 11:45 am

Hello, I gave figured out the answer to #1. Some cases are missing the stratification variable.

However, can you please inform me on issues 2 and 3?

Thank you sincerely.
Lisa

Linda K. Muthen posted on Wednesday, April 29, 2015 - 1:18 pm

2. You would need to run those data sets separately to see what problem they have.
3. No.

CB posted on Thursday, April 30, 2015 - 8:03 am

Thanks for the quick response!

Is it possible to impute for a nominal variable in Mplus? I tried using TYPE=BASIC, which didn't work. Does TYPE need a different option to impute a nominal variable?

As I mentioned before, I'm running multiple imputation as part of LCA and I want to perform multiple imputation using variables that I don't want in the LCA but included in the imputation. I'm following the suggestion, which is definitely appreciated, to run the imputation separately and before the LCA. However, is there a way to aggregate the results from all of the imputed datasets into one dataset and use that summary dataset to run in a single LCA? Is this a valid approach for imputation?

Jon Heron posted on Friday, May 01, 2015 - 3:11 am

The question of how to combine LCA and MI is in my view unanswered. At least I hope this is still the case - I am writing a grant on this at the minute!

Lanza and Collins (LCA/LTA book) talk about issues surrounding a pre-LCA MI-step and indeed you do run into the problem of pooling the results across imputed datasets. Particularly if in some of your imputed datasets a different solution may be supported or perhaps even the theory-driven solution is empirically non-identified.

Another problem with a pre-LCA MI-step is that the imputation is likely to be mis-specified. For instance, if your goal is for latent C to moderate an X-Y relationship it is not possible to add this to your imputation. The first source i have managed to find for a warning about pre-LCA imputation is

Colder CR, Mehta P, et al. Health Psychol 2001; 20(2): 127-35

However I did still carry out pre-LCA imputation in one of my own papers in 2011.

My own feeling is that in some cases pre-LCA imputation will be fine, some cases post-LCA imputation will be fine (i.e. shoe-horning an imputation step between steps 2 and 3 of the bias-adjusted three-step method) and in other instances we will need to bite the bullet and do what I think of as concurrent LCA and imputation - i.e. MCMC.

That is the essence of my grant. If anyone reading this has some spare cash, please drop me an email :-)

Tihomir Asparouhov posted on Friday, May 01, 2015 - 10:23 am

Answer to CB: It's not a clear to me what you want to do. Do you want to use the LCA model for imputing the additional variables? If so you can save the LCA parameters using the Bparameters option, use fixed set of parameters 100 iterations apart, fix the LCA parameters to those and impute one data set at a time. Nominal variables are not available in the imputations but they are equivalent to categorical / ordinal if you are imputing only from the latent class variable.

Bengt O. Muthen posted on Friday, May 01, 2015 - 2:01 pm

Response to Jon:

I think I mainly agree, but I wonder if one should distinguish between different roles for variables that have missingness/need MI. For LCA indicators I would not necessarily use MI but simply ML under MAR. For covariates predicting latent classes, I would perhaps use MI without worrying about latent classes. For distal outcomes I would probably not use MI but use ML-MAR, although if you don't want the distals to influence class membership, then perhaps you want to do an LCA-based MI for the distals, where that LCA doesn't necessarily have the same number of classes as the central one.

Jon Heron posted on Saturday, May 02, 2015 - 12:04 am

Thanks Bengt

yes, my current draft describes appropriate options for each variable type in turn.

If C-on-X model is planned I had been mulling over the idea of imputing covariates only and letting ML worry about class indicators. That would certainly reduce the variation in LCA solution across datasets and make it easier to pool results with confidence.

best, Jon

CB posted on Monday, May 04, 2015 - 6:51 am

Thanks Jon and Dr. Muthen for your thoughts!

Tihomir, my intention was to run multiple imputation as part of LCA. I had specified the variables I wanted to use in the LCA. I have missing data on an LCA indicator, so I wanted to impute them by using the variables specified for LCA in addition to other variables not in the LCA. Dr. Muthen had suggested that I perform the imputation separately from the LCA. Thus, my questions dealt with how to operationalize this.

I did want to clarify your response though. If I'm imputing missing data from an LCA indicator, then the missing nominal variable cannot be imputed and is just left as missing then?

Tihomir Asparouhov posted on Monday, May 04, 2015 - 9:17 am

Nominal variables can't be imputed in Mplus yet.

HwaYoung Lee posted on Wednesday, June 03, 2015 - 2:03 pm

Hello, I would like to report chi-square statistics with fit indices. But I used 10 multiple imputation datasets. MLM estimation method was used to handle non-normality of one factor with 6 indicators (out of 5).
So, Mplus output doesn't provide scaling factor.
How do I report chi-square statistics?

Thank you for your support and help.

Tihomir Asparouhov posted on Thursday, June 04, 2015 - 9:51 am

The test of fit with imputation is available currently only for single level ML estimation with continuous variables.

HwaYoung Lee posted on Thursday, June 04, 2015 - 10:38 am

Thank you for your answer.
SO, what is your suggestion?
I need to report chi-square statistics with BIC, AIC, aBIC, and so on.

Linda K. Muthen posted on Friday, June 05, 2015 - 8:05 am

These fit statistics have not yet been developed for the case of multiple imputation. If you need fit statistics, you need to use FIML for your missing data or listwise deletion.

HwaYoung Lee posted on Monday, June 08, 2015 - 8:39 am

Thank you for your reply. Dr. Muthen.
But I would like to clearly explain about my paper and want to get an advice.

One of reviewers criticized my and my colleagues' manuscript, because we didn't use multiple imputation for missing cases.

So, we did that before running SEM, we imputed missing data using multiple imputation in SPSS with EM algorithm and made parcel scores for a couple of latent constructs.

Then we conducted SEM (5 latent constructs).

As you know, I got fit indices and chi-square statistic averaged across 10 datasets.

Because I used MLM (robust to non-normality, because one latent construct is not normal), I need to use scale factor to adjust chi-square, but it is not provided in the output.

MY questions are:

1) Can I report fit indices averaged across 10 datasets in the manuscript?

2) if it is not possible, what are other options for my case? How can I make parcel scores and run SEM without multiple imputation?

3) If fit indices are okay to report, but reporting chi-square is only problem, what is your suggestion to report chi-square statistic?

Thank you for your answers in advnace.

Bengt O. Muthen posted on Monday, June 08, 2015 - 11:24 am

1) that would not be a good idea as explained in terms of chi-2 in our Topic 9 handout of 6/1/11, slides 210-216; see also the reference to the tech doc Asparouhov-Muthen (2010).

2) Don't use MI, use FIML.

3) No suggestion beyond the tech doc Asparouhov-Muthen (2010).

HwaYoung Lee posted on Monday, June 08, 2015 - 12:29 pm

Thank you so much for your answer.
So, let me clear about this.

When do I need to create parcel scores? before running SEM or within SEM syntax (e.g., define= ).
Can FIML handle missing cases when I create parcel scores?

Thanks,

Bengt O. Muthen posted on Monday, June 08, 2015 - 3:50 pm

I think you are asking how to handle parcels with missing data on items in the parcel. If so, this general analysis strategy question is best directed to SEMNET.

Varsha Gupta posted on Thursday, July 02, 2015 - 12:44 am

I am trying to use mixture modelling using multiple imputation files. I do not get any statistic or an output file with class information. If I use FIML, the Entropy =0.68, from imputation command Entropy= 0.78 and similar classes in both methods. Is there a way when I am using TYpe= MIxture missing; I can tell the program to use X5-X7 for imputations.

DATA: FILE IS "xyz.dat";
VARIABLE:
NAMES ARE x1 x2 x3 x4 x5 x6 x7 ;
USEVARIABLES ARE x1 x2 x3 ;
CLASSES = c(5);
MISSING ARE ALL (-999);
ANALYSIS: TYPE = mixture missing;
LRTSTARTS = 0 0 50 20
Starts =500 200;
MODEL:
%OVERALL%
i s | x1@0 x2@0.52 x3@2.2;
i@0; s@0;

Varsha Gupta posted on Thursday, July 02, 2015 - 12:45 am

I use follwing script in MPLus for Multiple imputation
DATA: FILE IS "EPDS_For_imputations.dat";

VARIABLE:
NAMES ARE x1 x2 x3 x4 x5 x6 x7;
CATEGORICAL ARE x5;
USEVARIABLES ARE x1 x2 x3 x4 x5 x6 x7;
MISSING ARE ALL (-999);

DATA IMPUTATION:
IMPUTE= x1 x2 x3;
NDATAsets=20;
SAVE=impute*.dat;
VALUES=x1-x3 (0-30);

ANALYSIS: TYPE =BASIC ;

Linda K. Muthen posted on Thursday, July 02, 2015 - 6:24 am

TYPE=BASIC is for descriptive statistics only. Remove this to estimate a latent class model.

Please limit posts to one window.

John Woo posted on Sunday, August 16, 2015 - 9:17 pm

Hi, when I run different models using Type=imputation, Mplus does not seem to produce the usual warning/error messages in the output (e.g., latent variable psi matrix not positive definite). Is this correct? Does this mean that I should manually check for any obvious estimation issues (e.g., latent var corr greater than one)?

Related, is it possible to have estimation issues (e.g., untrustworthy s.e., non positive definite psi, etc) with an individual imputed dataset and yet the final output across all imputed datasets appear to show no such issues?

Thank you in advance.

Linda K. Muthen posted on Monday, August 17, 2015 - 6:35 am

Add TECH9 to the OUTPUT command to get the messages. If there is a problem with an imputed data set, the analysis is not completed. You will see that in the output where it shows how many were completed.

John Woo posted on Monday, August 17, 2015 - 12:08 pm

Thank you. I have a quick follow-up question. I used Mplus to generate five imputed datasets. Tech 9 shows a warning for psi matrix in just one of the five imputed datasets. What would be the protocol for dealing with error/warning message in just a few of the many imputed datasets? Should I try to modify the model specification until all imputed datasets are error free? Or can I safely ignore (or even drop) the few problematic datasets? Would it mar the 'integrity' of multiple imputation if i ignore the few? Thank you again in advance.

Linda K. Muthen posted on Monday, August 17, 2015 - 12:17 pm

I would run the analysis on the data set with the problem to get more information about the problem.

Jennie Jester posted on Friday, November 06, 2015 - 11:31 am

I am estimating a structural equation model where the indicators of my latent variable are count data. I have a lot of missing data and I believe that multiple imputation will give me better estimation than FIML because it takes into account non-normality better. I ran a multiple imputation model, where I declared some of the count variables as categorical. Some of the count variables have more than 20 values and so I could not declare them as categorical variables.
The imputation seemed to run okay; however, when I tried to use TYPE IS IMPUTATION to use the imputed data sets, I got this error "Invalid symbol in data file:
"*" at record #: 1, field #: 42"
Sure enough, some of the variables have "*" in the fields in the *.dat files.
Do you agree with the way I am modeling my missing data and can you help me figure out why I got * in my data files?

Thanks,

Jennie

Bengt O. Muthen posted on Friday, November 06, 2015 - 1:18 pm

If the missing is on the count variables, I would use FIML. I don't see why Mult Imp would take non-normality better into account.

As for the second part, we would need to see your MI output and the data files, so send to Support along with license number.

Jennie Jester posted on Monday, November 09, 2015 - 4:27 pm

So FIML is robust to non-normal data? I thought I'd just read that it was not. Now I can't find the reference.

I have another question. Because of the complexities of our assessment, it often seems as if we have more missing than we actually do. We collect a lot of data every 3 years (wave data) and a subset of items are collected every year (annual). If the wave is done in that year, then the annual assessment is not done.The variables are created based on time of assessment. So a person will be missing on annual variables because the information is in the "wave" variable for that year. I could tediously recode the variables to avoid this, by deleting all the "wave" variables and recoding them into "annual" variables. What I am trying to figure out is, does it matter? Basically the same information is in the data set, but in the current coding system there is an extra variable with more missing data.

Bengt O. Muthen posted on Monday, November 09, 2015 - 5:23 pm

with missing data, ML (FIML) assumes normality but so does Multiple Imputation. Neither is perfectly robust to it, but if you don't have too low coverage you have a certain degree of robustness.

As for your other question, you may want to take that one by SEMNET since it is more general.

Jennie Jester posted on Tuesday, November 10, 2015 - 1:19 pm

If the indicators of my latent variable are counts with a lot of zeros, should I declare them as count variables or categorical?

Bengt O. Muthen posted on Tuesday, November 10, 2015 - 4:55 pm

Counts with a lot of zeros can be fit by ZIP or NEGBIN - possibly with inflation added. See our handout for the Topic 2 short course on our web site. And also mixtures discussed in Topic 5.

Kate Williams posted on Tuesday, December 08, 2015 - 12:26 pm

Hi all
I have used Version 7.3 to create 20 imputed datasets on which I am running further analysis (measurement and structural). My CFAs on these run fine until I add a grouping variable (multi group analysis). They then will not run and I get an input error message but no information in the output to help me identify the problem. Is this because type=imputation and 'grouping is xxx' cannot be used together?

Bengt O. Muthen posted on Tuesday, December 08, 2015 - 6:09 pm

Please send output to Support along with license number.

Kate Williams posted on Tuesday, December 08, 2015 - 6:24 pm

I post here the solution to my own query in case it helps others. See previous post of same date. I found in looking at the imputed datasets that the program had recoded my grouping variable. Originally this variable was coded 1 / 2 and in the imputted datasets it was recoded as 0 / 1. So changing my syntax under 'grouping' was the solution. Thanks for the prompt response Bengt.

Andreas Wahl posted on Thursday, January 28, 2016 - 5:51 am

Hi all,

I was wondering under which algorithm the H0 imputation operates? It is stated that it is a Bayes estimation - does that mean that it is a simple Data augmentaion algorithm or are the choices the same as for the H1 imputation?

I want to compare different approaches for missing data and I could not find anything that specifies exactly what H0 does...

Thanks for your help!

Bengt O. Muthen posted on Friday, January 29, 2016 - 6:17 pm

See the paper on our website:

Asparouhov, T. & Muth�n, B. (2010). Multiple imputation with Mplus. Technical Report. Version 2. Click here to view Mplus inputs, data, and outputs used in this paper.
download paper contact second author

Andreas Wahl posted on Saturday, January 30, 2016 - 12:18 am

Thank you for your reply.

I already read the paper, which is why I was wondering how H0 operates exactly and under which assumptions as it is not stated. It says, that H0 is a restricted imputation with a bayesian estimator. Now I was wondering, as there are three options for H1 imputation (Regression etc.), what is meant by this bayesian estimator?

Bengt O. Muthen posted on Monday, February 01, 2016 - 10:47 am

The 3 options are only for H1 imputation, specifying the model for the imputation. For H0 imputation you instead use the model in the MODEL command. That is, in the Bayesian MCMC iterations you not only view parameters as unknowns but also missing data. After convergence several more iterations are completed to give the missing data values. So the H0 imputations are based on the estimated H0 model specified in the MODEL command.

Andreas Wahl posted on Tuesday, February 02, 2016 - 6:05 am

Thank you very much for your response. This hepls a lot and leads me to another question:
Is it possible to impute the missing values with H0-imputation, save the data sets with the imputed values and analyze the model afterwards with ML?

Bengt O. Muthen posted on Tuesday, February 02, 2016 - 5:51 pm

Yes - you would take an approach similar to the two steps of UG ex 11.8.

Andreas Wahl posted on Tuesday, February 02, 2016 - 11:01 pm

Thank you very much!
This helps a lot.

sun young posted on Friday, March 04, 2016 - 12:52 pm

Hi I imputed each group separately and got con* and treat* imputed data sets. Now I am trying to generate an imputed_all data by combining the two imputed group data. I've been searching on this website but couldn't find an answer. I would appreciate it if you advise me. Thank you very much.

Bengt O. Muthen posted on Friday, March 04, 2016 - 6:01 pm

You can do multiple-group imputation in Mplus.

I am not sure if you want to combine your data vertically or horizontally. The former is multiple-group and the latter can be done by the MERGE option.

sun young posted on Monday, March 07, 2016 - 12:11 pm

Thank you. I did try the former suggestion. I have a follow-up question.

After creating 20 data set, I used "imputed_list.txt" as a imputed data set for my analysis. When I opened this imputed_list.txt file includes a list of 20 txt files.

Can I actually obtain and see (or save) the completed data (either txt or csv file)? I am having a hard time to define variables from the imputed variables so I thought I might do it in R ro Stata and come back to mplus for analysis.

Thank you very much again.

Linda K. Muthen posted on Monday, March 07, 2016 - 2:38 pm

The 20 data sets are also saved.

sun young posted on Monday, March 07, 2016 - 2:50 pm

Thank you very much for the prompt response, Linda. You're right about the saved 20 data sets--I have them. I might have a misunderstanding about this MI process but I am wondering about how I can create one completed imputation data set using these 20 data sets in a csv or txt form.

If I correctly understand it, a complete imputation data set using 20 data doesn't mean that I should merge those 20 data sets vertically so the number of observation will be n*20.

I appreciate your answer very much.

Linda K. Muthen posted on Monday, March 07, 2016 - 5:50 pm

In Mplus, the imputed data sets must be in different files. If you want to use them in a program where there is a requirement to put them in one file, you will need to see how that program needs them and put the files together that way.

Aurelie Lange posted on Sunday, April 03, 2016 - 12:20 pm

Dear Dr Muth�n,

I am trying to impute some continuous and categorical variables, which are measured longitudinally.
1. I am not quite sure how to decide on the correct imputation model.
2. I tried 'sequential', just to get some experience using multiple imputation in Mplus, but I get the error message "Fatal error. Failure to generate trunckated normal deviate. The problem occurred in chain 2." I am not quite sure what this means.

Thank you so much for your advice!

Sincerely,
Aurelie

Linda K. Muthen posted on Sunday, April 03, 2016 - 4:51 pm

Use the default if you are not sure.

Aurelie Lange posted on Tuesday, April 05, 2016 - 7:10 am

Dear Dr Muth�n,

I have a dataset of over 2000 clients with 4 measurement moments, and two levels (families within therapists).

1) When I use the default imputation method, multiple imputations run without problems. However, I don't get a column with 'rate of missing'. Is there anything I should specify in the input file to receive those values?

2) Also, based the paper by Asparouhov & Muth�n (2010) 'Multiple imputation in Mplus' I would think that sequential regression would be the preferred method, as I use a combination of continuous and categorical data. Moreover, the paper by Enders (2015) 'Multilevel multiple imputation' suggests that sequential regression (in his paper 'chained regression') performs best in models with random slopes. My model of interest will use random slopes:
ix sx| x@0 x@1 x@2 x@4;
iy sy| y@0 y@1 y@2;
iy on ix sx;
sy on ix sx;
Unfortunately, Mplus does not seem able to reach convergence when using sequential regression (after half a day, Mplus was still running iterations). Is this normal? Can I expect it to reach convergence at some point? What would be your advice?

Thank your for your advice!
Sincerely,
Aurelie

Tihomir Asparouhov posted on Tuesday, April 05, 2016 - 9:31 pm

1) You can analyze the imputed data like in user's guide example 13.13 with an unrestricted model like
model y1-y10 with y1-y10
Rate of missing refers to a model parameter - not variable and is specific for a model.

2) Use the default method.

Aurelie Lange posted on Tuesday, April 26, 2016 - 5:33 am

Dear Dr. Asparouhov,

Thank you for your reply. I managed to impute my data using the default method. However, the imputed data consists of 'impossible' values. For example, one of my variables has a range between 50 and 100, but the imputed datasets contain values out of this range.
- Is this a problem?

These variables are measured at multiple moments. I recode these variables in 1 new dichotomous variable consisting of decliners (if they fall below a certain cut-off during one of the measurement moments) and persisters (if they stay above the cut-off all of the time).
- Is it problematic that the imputed values are out of the 'possible' range in this situation? If so, is it possible to impute using a restriction on the range?

Thank you for your advice!

Sincerely,
Aurelie

Linda K. Muthen posted on Tuesday, April 26, 2016 - 6:44 am

Are you sure you are reading the original data set correctly. Do a TYPE=BASIC with no MODEL command on the original data to check the sample size and descriptive statistics.

Aurelie Lange posted on Tuesday, April 26, 2016 - 7:35 am

Dear Dr Muthen,

When I checked the data using type=basic, I don't see any problems. The means look ok�, although they are slightly different from the means I get in SPSS.
Does Mplus usually only impute observed values? Is it possible to specify within which range Mplus should impute?

Thank you for your time!

Aurelie

Bengt O. Muthen posted on Tuesday, April 26, 2016 - 9:24 am

Mplus can impute latent variable values too - that is, factor scores. You can specify that a variable is categorical, e.g. binary. Range restrictions are not available for continuous variables but it is rarely a problem.

Linda K. Muthen posted on Tuesday, April 26, 2016 - 9:27 am

See the VALUES option of the DATA IMPUTATION command. You can restrict the values with this option. If you continue to have problems, send the files and your license number to support@statmodel.com.

Aurelie Lange posted on Friday, April 29, 2016 - 12:35 am

Thank you for your advice. The VALUES option seems to work.

Joshua Wilson posted on Friday, April 29, 2016 - 6:38 pm

I used Mplus to conduct multiple imputation to estimate values of missing categorical likert scale survey data, using the Type=BASIC command (according to the example in ch.11).

I was asked by a reviewer whether this method is "hot deck or cold deck" multiple imputation. My sense is that it is neither, since Mplus uses a Bayesian approach.

Any suggestions for how to respond to the reviewer's comment?

Thanks!

Bengt O. Muthen posted on Friday, April 29, 2016 - 7:06 pm

It is not hot or cold deck. The method is described in the paper on our website under Papers, Bayesian Analysis:

Asparouhov, T. & Muth�n, B. (2010). Multiple imputation with Mplus. Technical Report. Version 2. Click here to view Mplus inputs, data, and outputs used in this paper.
download paper contact second author

It is also described in a shortened form in the UG - see the index.

Joshua Wilson posted on Friday, April 29, 2016 - 7:16 pm

Thank you.

Chris Cambron posted on Tuesday, May 10, 2016 - 12:25 pm

Hi,
I've been through the examples and cannot find any examples of how to impute a multilevel dataset. I have individual level data clustered at the county level (with weights) and would like to impute the individual level missing values (there are no missing level two values). I am concerned that the clustering won't be accounted for in the MI. How could I set this up? Thanks!

Chris Cambron posted on Tuesday, May 10, 2016 - 12:46 pm

Sorry, I should have posted my code as well. This runs but I am unclear if I am appropriately accounting for clustering and weights.

VARIABLE:
NAMES = cntyid M80py Apr13pt pt10_13 zM80py zApr13pt zpt10_13 uid wtcnty08 sex age
rhisp rnativam rasian rblack rnhopi rwhite frpl mj301 mj3010 rkmjmg pwmjw fwmjw;

MISSING=.;

CLUSTER = cntyid;

WEIGHT = wtcnty08;

USEVARIABLES ARE
cntyid wtcnty08 sex age rhisp rnativam rasian rblack rnhopi rwhite frpl
mj301 mj3010 rkmjmg pwmjw fwmjw;

AUXILIARY= uid M80py Apr13pt pt10_13 zM80py zApr13pt zpt10_13;

DATA IMPUTATION:
IMPUTE = sex (c) age rhisp (c) rnativam (c) rasian (c) rblack (c) rnhopi (c)
rwhite (c) frpl (c) mj301 (c) mj3010 (c) rkmjmg (c) pwmjw (c) fwmjw (c);

NDATASETS = 40;
SAVE = ncast*.dat;

ANALYSIS:
TYPE=BASIC TWOLEVEL;

OUTPUT: TECH8;

Chris Cambron posted on Tuesday, May 10, 2016 - 5:21 pm

My apologies, I found it here - http://www.statmodel.com/discussion/messages/22/4640.html?1360259751

Thanks for providing such a useful resource!

Aurelie Lange posted on Tuesday, July 26, 2016 - 1:32 pm

Dear Dr Muth�n,

I imputed 40 datasets. When I run my analyses, it often happens that only a subset of those 40 are replicated. If the replicated datasets provide warnings, I use those warnings to adapt my model. Often I do end up with a model which replicates for all 40 datasets. But sometimes, it does not.

1) How problematic is it, if a model does not replicate for 1 or 2 of the 40 imputed datasets?
2) Is it appropriate to use warnings which do not appear in all 40 datasets to adapt the model? So, for example, if a certain path seems problematic according to the warnings of 10 or 15 datasets, would it be appropriate to remove the path in the model, which would then be run on all 40 datasets? Or is there a better way to go about 'solving' such warnings?

Thank you!

Sincerely,
Aurelie

Bengt O. Muthen posted on Tuesday, July 26, 2016 - 1:50 pm

Tough question. It's true that the non-convergence of each of the 40 replicates is somewhat informative about the fragility of the model. But you may be deleting a theoretically and statistically important path - that perhaps would have no problem if other parts of the model were correctly specified.

Katharina Klug posted on Thursday, September 01, 2016 - 9:27 am

Hello,

I am trying to impute covariates for a LCGA using multiple imputation. I used the following input:

VARIABLE: NAMES ARE [all the variables in my dataset];
IDVARIABLE IS pid;
USEVARIABLES ARE [all variables in my dataset except pid and the auxiliary variables]
CATEGORICAL ARE [all categorical variables, both those to be imputed and those used for imputation];
AUXILIARY ARE [my outcome variables, which I want to keep in my dataset, but not use for imputation];
MISSING ALL (-9999);
DATA IMPUTATION: IMPUTE = (c) [the categorical variables which I want to impute];
NDATASETS = 10;
SAVE = lcga_pred_imp*.dat;
ANALYSIS: TYPE = BASIC;
OUTPUT: TECH8;

This works without error or warning messages, but when I subsequently want to run my LCGA with the imputed datasets, I get the following error:

"*** ERROR
Invalid symbol in data file:
"*" at record #: 2, field #: 30"

Does the imputation input look correct to you? Why does it produce these invalid records in the imputed datasets? Looking forward to your feedback.

Linda K. Muthen posted on Thursday, September 01, 2016 - 10:35 am

The imputed data set have an asterisk as the missing data flag. You should be reading to imputed data sets according to the information about the saved data sets shown at the end of the output when they were saved.

Katharina Klug posted on Friday, September 02, 2016 - 12:54 am

Yes, now it works, thank you very much!

Tom McDonald posted on Thursday, December 08, 2016 - 10:13 am

I am trying to run a CFA on 100 imputed files. The model runs on about half of the file and for the other half I get that they "Did Not Result in a Completed Replication".

I'm not sure of the next step in this process.

Bengt O. Muthen posted on Thursday, December 08, 2016 - 5:19 pm

Please send input, output and original data to Support along with your license number.

Tibor Zin posted on Friday, December 16, 2016 - 2:43 am

Dear Dr Muth�n,

I would like to ask a question about fit indices with imputed data. I have a two-wave data with 2400 observations but 1400 people did not participate in the second wave missing at random. I have imputed 25 datasets and conducted multiple group analysis.

My problem is that using maximum likelihood estimator, the following error appeared:
THE CHI-SQUARE COULD NOT BE COMPUTED. THIS MAY BE DUE TO AN INSUFFICIENT NUMBER OF IMPUTATIONS OR A LARGE AMOUNT OF MISSING DATA.
Increasing number of imputation does not help. However, using MLR does. Could you please tell me where is the problem?

Many thanks!

Bengt O. Muthen posted on Friday, December 16, 2016 - 5:36 pm

It sounds like you first do imputation and then do ML on those data. I'd suggest simply using ML on the original data. If this doesn't help, please send your outputs to Support along with your license number.

Tibor Zin posted on Sunday, December 18, 2016 - 8:11 am

Thank you for fast reply!

I am sorry but I did not provide all information. I am not using latent variables but only observed variables. I believe that in this case all missing data would be excluded if I use ML.

Bengt O. Muthen posted on Monday, December 19, 2016 - 6:02 pm

Just use ML. I don't see why imputation is needed.

Tibor Zin posted on Tuesday, December 20, 2016 - 5:10 am

I think that I should apply MI because I use 6 variables from the first wave (no missing), 2 variables from the second wave (60% attrition) but 4 variables were created by subtraction of variable in the first wave from variable in the second wave. Thus, when the second wave data are missing, this variable has also a missing value. But if I use MI instead of FIML, I can put into imputations original variables based on which these variables were created.
In other words, I loose more information using FIML.

Bengt O. Muthen posted on Tuesday, December 20, 2016 - 2:07 pm

Ok. But you can also use Auxiliary(M) with ML.

Tibor Zin posted on Tuesday, December 20, 2016 - 3:05 pm

Thanks for the advice! But should I
specify that these auxiliary variables would predict their counterparts?

Is it possible to specify the influence of auxiliary variables in Mplus? Let�s say that one auxiliary variable would predict missing values of one observed variable and the second auxiliary variable would predict missing values of the second observed variable?

Bengt O. Muthen posted on Tuesday, December 20, 2016 - 6:14 pm

I am referring to the automatic approach of Auxiliary (M), but you can also do it "by hand" in your own setup. We discuss this in one of our topics related to missing data in the set of short courses on the web.

Grant Jackson posted on Thursday, December 22, 2016 - 7:08 am

I am just wondering if there is any way to do multiple imputation such that the result is one dataset (consisting of the averages for each value of each variable across my many imputed datasets).

I ask because I would like to create pre-post change variables (Wave 2 response - Wave 1 response), and it seems like I would need to do so after imputation . . . and based on particular values found in one dataset.

Note: I already have pre-post change values in cases where I had Wave 1 and Wave 2 responses. Those were created in SPSS prior to the thought of doing multiple imputation, and prior to me bringing the data into MPlus. I ask the question above because I assume that I should NOT impute values for missing data for this variable (given that those missing values are missing because Wave 2 data are missing, which would now be in the process of imputation themselves). If imputing pre-post change values is somehow appropriate, please let me know because that solves my problem.

Thoughts? Thank you in advance for your help.

Grant Jackson posted on Thursday, December 22, 2016 - 2:12 pm

Second question of the day (my first question is above at 7:08 on December 22):

When doing multiple imputation, how do I constrain the possible valid range of what imputed values can be?

For example, I have many 1-7 Likert scales in my dataset, but I noticed that values of 8 and above were being imputed when 7 is the max. I'd like to set the range from 1-7.

Thanks!

Tihomir Asparouhov posted on Friday, December 23, 2016 - 10:12 am

I would recommend going a step back before any data processing is done and perform the imputation on the raw/original data. You can then use the Mplus define command to form the differences. You can average the imputed data sets, but I have not seen that done before.

On your second question you have two options

a. Impute the variable from a model treating the variable as categorical
"data imputation: impute=Y(c);"
means that Y will be imputed as categorical variable. See user's guide example 11.5.

b. Round off to particular values - see user's guide page 521
"data imputation: values=Y(1-7);"

Grant Jackson posted on Friday, December 23, 2016 - 10:37 am

Thank you very much, Tihomir. Very helpful. Quick update and some new questions:

1. Regarding the pre-post change variables, I ended up using the DEFINE command after imputation to create them (Wave 2 - Wave 1) and it seemed to average from the multiple imputed datasets just fine. Very exciting.

2. NEW QUESTION: I found the VALUES option and used it as you described. For some reason, it keeps telling me I have and "Unknown variable in VALUE option," despite the fact that the variable is also where it needs to be above in the input file (NAME, USEVARIABLE, IMPUTE). I also found and used the ROUNDING option (for other reasons) and I got the same outcome for the same variable "Unknown variable in ROUNDING option." What am I not understanding?

3. NEW QUESTION: When I open up one my newly imputed datasets in Notepad++, I am able to scroll right far enough so that each variable (222 total) can have its own column. However, in Notepad and in the MPlus data viewer, both programs run out of room to the right, so the values continue on the next line. This messes up the column alignment, places values where they should not be, etc. I was actually able to "successfully" run my model for the imputed datasets, but there were various WARNINGS that were clearly related to the columns being misaligned, and the results were clearly off. Thoughts on this?

Thank you very much in advance!

Bengt O. Muthen posted on Friday, December 23, 2016 - 2:23 pm

Please send your output and and one imputed data set to Support along with your license number.

Po-Yi Chen posted on Sunday, February 19, 2017 - 11:05 pm

Dear Dr. Muthen,

I get a question about the confidence intervals obtained from the ��model constrains�� + ��data imputations�� command. I got CIs for the new defined parameters in my model after imputations by using the code below. However, I wondered how should I report these CI? Is it correct for me to say these Cis are obtained by Mplus after pooling the standard errors from the delta method, or they are the average of CI across imputed data sets calculated by Mplus?

Code:
data imputation:
IMPUTE = v4 (c) v5 (c);

MODEL: f1 by v1 (L11)
v2 (L21)
v3 (L31)
v4 (L41)
v5 (L51);
model b:
f1 by v1 (L12)
v2 (L22)
v3 (L32)
v4 (L42)
v5 (L52);
model constraint:
new( IL2 IL3 IL4 IL5);
IL2 = L21 �VL22;
IL3 = L31 �VL32;
IL4 = L41 �VL42;
IL5 = L51 �VL52;

analysis:
OUTPUT: Cinterval;

Results
New/Additional Parameters
IL2 -0.118 -0.069 -0.044 0.086 0.217 0.242 0.291

Julia Sheffler posted on Monday, February 20, 2017 - 7:25 pm

Dr. Muthen,

I am attempting to run a mediation model with covariates using imputed data. I used the MODEL constraint to examine the indirect effect. The model terminates normally, and the means of the variables and sample size are correct, but the S.E.'s are all identical and every variable in the model is significant. Am I missing a step? Below is my input:

Variable:
Names are Health3w ELA age sex race educ inc10 health2w fsumal;

Missing are all (999);
Usevariables are Health3w ELA age sex race educ inc10 health2w fsumal;

ANALYSIS:
Bootstrap is 1000;

Model:
Fsumal on ELA age sex race educ inc10 health2w (a);
health3w on Fsumal ELA age sex race educ inc10 health2w (b);

Model CONSTRAINT:
New(ab);
ab=a*b;

output: standardized;

Tihomir Asparouhov posted on Tuesday, February 21, 2017 - 11:22 am

Here is the answer to Po-Yi Chen:

They are NOT the average of CI across imputed data sets calculated by Mplus.

It is correct to say that these CI are obtained by Mplus after pooling the standard errors from the delta method. The pooling is the standard imputation pooling, see bottom of page 3
https://www.statmodel.com/download/MI7.pdf
The delta method is used for each imputation before pooling.

Bengt O. Muthen posted on Wednesday, February 22, 2017 - 12:11 pm

Julia Sheffler:

Please send your output to Support along with your license number.

Po-Yi Chen posted on Saturday, February 25, 2017 - 10:54 pm

Dear Dr. Muthen and Dr. Asparouhov:,

Thanks for your quick reply! I get another quick question about how to appropriately report the Mplus warning message ��MODINDICES option is not available for multiple imputation��

I use the code bellow the test my MG-CFA model but get the this warning message. I wondered how should I report this warning from Mplus? Is it correct for me to say this is a software specific limitation, or it is ok to say this is a limitation in SEM that how to pool modification indices after imputations is still unclear?

usevariables are
group v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
categorical are v1-v10;
auxiliary = aux1;
GROUPING = group (0 = a 1 = b);
missing are all (999);

data imputation:
IMPUTE = v8 (c) v9 (c);
MODEL: f1 by v1-v10;
analysis:
OUTPUT: MODINDICES (0.001);

Bengt O. Muthen posted on Sunday, February 26, 2017 - 9:28 am

Researchers have not yet developed modification indices for analyzing multiple imputation data. It is not software specific.

Anders Albrechtsen posted on Tuesday, February 28, 2017 - 1:38 am

Dear Dr. Muthen,

When using Multiple Imputation, is it possible to constrain the range of possible outcomes for the imputed values? SPSS and SAS have this option.

The reason why I'm asking is that I have a range of ordinally 0-10 (11 points) scaled covariates with missing values and I don't believe it makes any sense if Mplus imputes a value of 10.46 which is outside the range of the scale.

I realize that multiple imputations will iron out such discrepancies, but wouldn't it still inflate variances unless possible values are constrained to match the actual range of the scale?

Best regards,
Anders

Joshua Wilson posted on Tuesday, February 28, 2017 - 12:11 pm

Hello,

I used SPSS to create 5 multiply imputed data sets (which, in SPSS, means they are all in one file with a categorical "imputation number" variable added). Is there a way I can use that file in Mplus to run path analysis models? Do I use the categorical "imputation number" variable as a grouping variable and run the path analysis in the five "groups" (i.e., across the five data sets)?

Thanks!

Bengt O. Muthen posted on Tuesday, February 28, 2017 - 6:22 pm

See the UG ex11.8 part 2 on page 403 where the Data command uses Type=Imputation. On our website for the UG examples you can see what the input files look like (the list and the individual imputation files).

Tihomir Asparouhov posted on Tuesday, February 28, 2017 - 7:10 pm

Here is the answer to Anders:

Use the command

DATA IMPUTATION: VALUES = y(0-10);

Anders Albrechtsen posted on Tuesday, February 28, 2017 - 11:53 pm

Thank you, Tihomir!

Virginia Rangel posted on Monday, March 06, 2017 - 10:07 am

Hello,

I was able to run the multiple imputation command for my data, and the appropriate datasets were created. However, when I tried to request descriptive statistics using the new datasets, I got the following error message and I am not sure what I did wrong. I opened the .dat files and there are data in it, but there also is a column with all "*"s.

Error message:
The number of observations is 0. Check your data and format statement.

Invalid symbol in data file. "*" at record #: 1, field #21

Syntax for descriptive statistics:
Data: file=psf1milist.dat;
type=imputation;

variable: names=ID X1-X20 W1school w1s001-w1s200;
weight=w1school;
repweights=w1s001-w1s200;

analysis: type=complex;
repse=brr;

output: sampstat;

Unfortunately, I cannot share my dataset.

Thank you!

Bengt O. Muthen posted on Monday, March 06, 2017 - 6:07 pm

We need to see the output files. Send to Support along with license number.

Lois Downey posted on Wednesday, June 28, 2017 - 5:39 pm

I am using multiple imputation to handle missing data in my data set. Most of my variables conform to the restriction on categorical data that there be ten or fewer categories. However, I have a few variables that -- although they can take on only positive integer values -- can range from 0 through 11.

When I treated these variables as continuous, the multiple imputation routine worked, although the imputed values included decimal digits, and some were negative or greater than 11. To see whether it was possible to restrict the imputed values to the valid range by declaring these 0-11 variables as categorical, I did a test run. That run aborted with a fatal error, "Problems occurred during the data imputation. The psi matrix is not positive definite. The problem occurred in chain 1." Is this because the non-missing values on these variables include more than 10 categories, or is the problem likely caused by some other issue?

Thanks!

Bengt O. Muthen posted on Wednesday, June 28, 2017 - 6:33 pm

Try treating the variable as continuous and use the Values option described in the V8 UG (on our website) on page 579.

Lois Downey posted on Thursday, June 29, 2017 - 8:21 am

Oh, great! Thanks so much!

Nassim Tabri posted on Friday, July 21, 2017 - 11:07 am

Hi Mplus team,

I have a question about obtaining Bayesian plots (Type=Plot2) with multiple imputation when the imputation procedure is similar to example 11.5 in the user guide.

The plots are not available when Type=Basic in the Analysis command. However, when I use Type = general basic and indicate Estimator = Ml, I obtain the plots.

Is this the correct way to obtain the Bayesian plots for example 11.5 in the user guide?

Thank you for your time!
Nassim

Bengt O. Muthen posted on Friday, July 21, 2017 - 4:58 pm

What is it that you want to plot?

Nassim Tabri posted on Sunday, July 23, 2017 - 7:31 am

Hi Bengt,

I would like to obtain the following Bayesian plots for a set of variables I am imputing (very similar to example 11.5 in the user guide, but this example does not request plots):

Posterior parameter distributions
Posterior parameter trace plots
Autocorrelation plots
Prior parameter distributions
Posterior predictive checking scatterplots
Posterior predictive checking distribution plots

Below is the syntax I used (note that the variable "sex" is a missing data correlate):

Usevariables are CA prob severe effic sex;
missing = CA prob severe effic(-999);
DATA IMPUTATION:
IMPUTE = CA prob severe effic;
NDATASETS = 10;
SAVE =LEBregimp*.dat;
ANALYSIS:
Type = Basic;
PLOT:
TYPE=PLOT2;
OUTPUT:
cinterval Tech8;

If I use Type = Basic as in example 11.5, then no plots are provided. However, if I use Type = General Basic, then the ML estimator is used and I obtain the plots. Not sure what is going on and very much appreciate your help!

Thanks,
Nassim

Bengt O. Muthen posted on Sunday, July 23, 2017 - 5:35 pm

The different approaches are explained in the flow chart on page 576 of our User's Guide on the web. When you say General and get ML but don't have a Model command, you get a default model that is probably not what you want (check with Tech1).

The 11.5 imputation analysis does indeed not come with Bayes plots.

Nassim Tabri posted on Sunday, July 23, 2017 - 6:35 pm

Thanks Bengt!

Oakley Wall posted on Monday, September 25, 2017 - 7:24 am

Hi Drs. Muthen,

Can you do multiple imputations with the IMPUTE command and also specify TYPE=COMPLEX?

I am getting an error message when I try to run this, saying it is not possible. I would like to impute several predictor variables, but also have complex, two level data and also want to account for that.

Thank you!

Bengt O. Muthen posted on Monday, September 25, 2017 - 5:42 pm

You can use Type=Twolevel but not Complex.

Ilana Raskind posted on Monday, November 20, 2017 - 5:02 pm

Dear Drs. Muthen,

I used multiple imputation (H1 models to generate 20 datasets) to address missing data in a two-level linear model. I am trying to respond to a peer reviewer request for more detail regarding the estimation procedures used. The output reads 'Estimator = MLR', but there is also a portion that reads 'Specifications for Bayesian Estimation'. I did not specify a particular type of estimation in my input instructions.

Are you able to clarify for me the estimation used for the imputation?

Thank you so much for your time,
Ilana

Tihomir Asparouhov posted on Tuesday, November 21, 2017 - 3:30 pm

The imputation is based on Bayesian MCMC estimation of unrestricted mean and variance covariance model, see equation (3) in
http://statmodel.com/download/Imputations7.pdf

The Baysian estimation is used for the imputation of the missing data while the MLR estimator is used to estimate your two-level model for each imputed data. The results of the MLR estimator for each imputed data set are subsequently combined into the final result.

Ilana Raskind posted on Tuesday, November 21, 2017 - 4:20 pm

Thank you so much, Dr. Asparouhov. This is exactly the information I needed.

All the best,
Ilana

aprile benner posted on Thursday, February 22, 2018 - 10:10 am

Hi -

In a recent article by Enders and Mansolf (2016) in Psych Methods, they are proposing a way to calculate meaningful CFI, RMSEA, and TLI values when using multiply imputed data. I am curious if Mplus has integrated this into the estimation of the pooled fit indices for these, or if the calculations remain averages across the imputed datasets.

thanks,

aprile

Tihomir Asparouhov posted on Thursday, February 22, 2018 - 3:28 pm

We have been computing CFI, RMSEA and TLI since Mplus version 5.20 which dates back to 2008. See also
https://www.statmodel.com/download/MI7.pdf

The method that we have used is identical to the method described in Enders and Mansolf (2016). All these apply exclusively to the ML estimator (both the article as well as the method implemented in Mplus). All other estimators report the average.

Seamus Harvey posted on Wednesday, April 18, 2018 - 2:47 am

Hello. I recently completed BSEM with 20 imputed datasets. Gelman (2004) recommends that one should mix together simulations from the separate inferences to calculate the final results. Would you have any advice on how to do this / what this actually means? For example, is it as simple as reporting the widest range of the 95% confidence intervals, or as I imagine, is it more complicated than this?

Tihomir Asparouhov posted on Wednesday, April 18, 2018 - 4:58 pm

We currently don't support the combination of Bayes estimation and multiple imputation unfortunately so you have to do this manually for each model parameter.

The combined point estimates are obtained as the average across the 20 runs. The standard errors have to be computed using the formula at the bottom of page 3
https://www.statmodel.com/download/MI7.pdf
For a particular parameter Vm is the square of the standard error from the Mplus output, while Qm is the point estimate. It is pretty easy computation that can be done by hand or in excel. At the end, take the square root of V to obtained the combined standard error for that parameter.

Seamus Harvey posted on Wednesday, April 18, 2018 - 11:59 pm

Thank you very much Dr. Asparouhov. In my Bayes output, there is no standard error reported. Can the posterior standard deviation be used instead of this, or is there any command I can use to attain the standard error?

Tihomir Asparouhov posted on Thursday, April 19, 2018 - 9:06 am

Yes - that is what I meant.

Seamus Harvey posted on Thursday, April 19, 2018 - 11:07 am

Thank you.

Pia Kreijkes posted on Monday, June 25, 2018 - 1:31 am

Hi,

1) Is there any way in Mplus to explore the number of factors on the between-level if TYPE=IMPUTATION? It does not seem to work with EFA TWOLEVEL but maybe there is a way using ESEM? I already know the factors on the within-level.

2) All my variables are measured on a within level (students report frequency of teaching practices). For the between level, do I need to define new variables with the aggregated student ratings or does Mplus do this automatically when I specify the cluster variable?

I'm sorry if these questions were answered already but I couldn't find it.

Many thanks

Tihomir Asparouhov posted on Monday, June 25, 2018 - 9:50 am

1) I would recommend running the imputed data sets one at a time in determining the number of factors and then use the number of factors that was picked most often (if the amount of imputed data is small the number will be the same across imputed data sets).

2) Mplus will do this automatically and you will be able to find these in the output, however, do not specify the variables on the within= option (that means within only).

Pia Kreijkes posted on Monday, June 25, 2018 - 10:51 am

Thanks for your quick help. Regarding 1), I would then of course not have the factor loadings to go with it or could I average the results from the separate analyses?

Tihomir Asparouhov posted on Monday, June 25, 2018 - 2:52 pm

You can average the results from the separate analysis to get point estimates. For the standard errors you will have to use the formulas on page 3
https://www.statmodel.com/download/MI7.pdf

Alternatively, once you have decided on the number of factors, you can switch to CFA and Mplus will do these formulas for you.

Virginia Rangel posted on Thursday, November 01, 2018 - 9:25 am

I have a follow up question regarding using the VALUES command when trying to limit the values for continuous variables during multiple imputation. I am following what is in the UG 7 & 8, which says to limit them as such: VALUES=y1-y4 (1-5) (p. UG7_521/UG8_579). I would like to constrain four continuous variables to take on values between 0 and 100, which I have written into my syntax as: VALUES=FRL Lat White Black (0-100). However, when I look at one of the five imputed files, there still are negative values and values greater than 100. What am I doing wrong during the imputation?

Tihomir Asparouhov posted on Saturday, November 03, 2018 - 9:59 am

Send your example to support@statmodel.com.

Daniel Lee posted on Wednesday, August 14, 2019 - 11:00 am

Hi Dr. Muthen,

I used multiple imputation to treat missing data prior to estimating a regression model due to 30-50% missing values on some of the covariates.

Everything ran smoothly. However, I was wondering if there were some things I can do in Mplus to check that the multiple imputations are valid (e.g. posterior predictive checking? graphical displays of imputed vs. observed data set?, etc). If there are resources (papers, videos, etc), I would be grateful if you can point me to them.

Thank you!

Bengt O. Muthen posted on Wednesday, August 14, 2019 - 5:16 pm

Mplus has nothing automatic for this and I don't think we have a paper like what you are asking for. I assume you have read our paper on Bayes mult imps.

Also, you can compare your results with those of FIML where you bring the covariates into the model.

Christoph Weber posted on Friday, September 06, 2019 - 1:39 pm

Dear Mplus-Team,
is there a way to save estimates and their SE from each single imputation, when using mi-data for analyses?

In detail, I use nested multiple imputations (10 Imputations per 10 plausible values in PISA-data). So I have to adapt the combining rules for the SE.

Tihomir Asparouhov posted on Friday, September 06, 2019 - 3:12 pm

I think you can use the sample mean and standard deviation (for each imputed value) as an approximation.

Christoph Weber posted on Thursday, September 12, 2019 - 2:23 am

thanks, i will try this.

a follow-up Question.
I run a threelevel model with MI-Data (Pisa data, 37 countries, 9000 Schools, 250.000 students).
Mplus stops running after the first data-file and opens no Output. If I open the Output manually, the Output just Shows the Input Syntax. If I run each data file seperatly it works fine. And also "smaller models" run with type = imputation.

Do you have any suggestions how to fix this Problem?
Christoph

Bengt O. Muthen posted on Thursday, September 12, 2019 - 2:08 pm

Please send the relevant files to Support along with your license number to we can see what's going on.

Aurelie Lange posted on Tuesday, May 05, 2020 - 2:30 pm

Dear Prof Muthen,

I am running analyses on imputed datasets using TYPE = IMPUTATION. For some of the analyses, a note under the sample statistics is printed, stating that these are average results over 0 datasets. Indeed, I do not get values for the sample statistics. I do, however, get model results.
1. Is this note an indication that something is going wrong in using the multipel imputed datasets? Or can I ignore this note?

For some other models, I do get sample statistics, but only for some of the datasets (e.g. 29 of the 40). In one instance, this seems to lead to an error in the model as I do not get complete model results (only estimates). Mplus does not seem to give any warning or error message.
2. Would you have any suggestions how to solve this?

Thank you!

Kind regards,
Aurelie

Tihomir Asparouhov posted on Wednesday, May 06, 2020 - 2:47 pm

Imputation error messages are found at the of the file, use the option
output:tech9;

The main result regarding how many of the imputed data sets worked can be found at the top of the output file. That would be the most important. Look for this section

SUMMARY OF ANALYSIS
Number of groups
Average number of observations
Number of replications
Requested
Completed

I think you have to update your Mplus version to 8.4. If you experience further problems send your example to support@statmodel.com

Katherine Paschall posted on Monday, October 05, 2020 - 6:01 am

We are running into issues with identification when using multiple imputation (n=5) with a multi-group CFA. We are using a series of multi-group CFAs to test measurement invariance of a measure of school climate by gender (with either 2 or 3 groups). When we start to free parameters between the groups (e.g., for metric invariance), the model is no longer identified. However, when we run the model on any of the 5 imputed datasets on their own, the model is identified. Is there a reason that MI would require more degrees of freedom?

Bengt O. Muthen posted on Monday, October 05, 2020 - 5:25 pm

No, there is no such reason so there must be something else going on. To diagnose it, we need to see the full output - send to Support along with your license number.