Multiple imputation PreviousNext
Mplus Discussion > Missing Data Modeling >
Message/Author
 Shige posted on Saturday, April 24, 2004 - 3:23 pm
Dear All,

I am trying to do a SEM survival model where some of the covariates in the measurement model have some heavily missing data (20%). Multiple imputation seems to be the best choice in this case.

Based on my reading of the Mplus 3 user guide, Mplus does not have the facility to carry out multiple imputation, but it can process imputed data (example 12.13). In that case, can anybody share their experience about which multiple imputation software to use to work with Mplus? I know there is large body of literature of multipe imputation, I am a little lost...

Thanks!
 Linda K. Muthen posted on Saturday, April 24, 2004 - 3:47 pm
Joe Schafer's NORM program is probably one way to get imputed data. I believe that it is freeware. Schafer is at Penn State in the Statistics Department.
 Shige posted on Sunday, April 25, 2004 - 2:12 am
Thanks Linda, that seems to be a good place to start. I also find that Gary King has a program called "amelia" that does similar things, and it seems to be able to hand non-normal data pretty well (which NORM is not designed to handle).
 Anonymous posted on Sunday, April 25, 2004 - 6:36 pm
btw, consider also modeling your missing data variable, if your variable is categorical you can model it with a mixture model, you can also do this with ordered/unordered categorical without mixture
 Shige posted on Monday, April 26, 2004 - 10:26 am
Dear Anonymous,

Can you point me to a example? Thanks!

Shige
 Anonymous posted on Monday, April 26, 2004 - 9:54 pm
the mixture approach is described in the CACE papers
http://statmodel.com/mplus/examples/penn.html#jo
http://statmodel.com/references.html#catlatent
and ex7.24

the approach without mixture is basically a regular path analysis, add type=missing and integration=montecarlo to example ex3.12

btw, both of these methods should produce the same results as multiple imputations
 Shige posted on Monday, May 03, 2004 - 11:18 pm
Thanks, it's very helpful.
 Anonymous posted on Monday, May 24, 2004 - 6:07 pm
Hello,

In my data, there are 7% missing data for one variable, 1%-3% for 4 variables, and about 9% for 5 covariates. All these variables would be included in my MIMIC model. Based on 3 of that 5 covariates with 9% missing rate, I would extract 3 subpopulations (the 3 subpopulation may have lower missing rate for other variables). All analysis would be implemented on these 3 subpopulations.

Then, can I ignore the missing data or do I need to run imputation to replace them? If I need to impute, which is better -- impute before subpopulation extraction or after?

Thank you.
 Anonymous posted on Monday, May 24, 2004 - 6:14 pm
Hello,

In my data, there are 7% missing data for one variable, 1%-3% for 4 variables, and about 9% for 5 covariates. All these variables would be included in my MIMIC model. Based on 3 of that 5 covariates with 9% missing rate, I would extract 3 subpopulations (the 3 subpopulation may have lower missing rate for other variables). All analysis would be implemented on these 3 subpopulations.

Then, can I ignore the missing data or do I need to run imputation to replace them? If I need to impute, which is better -- impute before subpopulation extraction or after?

Thank you.
 Linda K. Muthen posted on Tuesday, May 25, 2004 - 8:19 am
Are your dependent variables continuous or categorical?
 Anonymous posted on Tuesday, May 25, 2004 - 9:25 am
My dependent variables are 3 levels of ordered responses (1-3).
Thank you.
 Anonymous posted on Tuesday, May 25, 2004 - 2:07 pm
Linda,
My dependent variables are 3 levels of ordered responses (1-3). In that case, do I need to impute missing data?
Thank you.
 bmuthen posted on Tuesday, May 25, 2004 - 8:05 pm
You might want to do multiple imputations to handle the missing data on the covariates and then do modeling of the categorical outcomes taking missing data on the outcomes into account.
 Alan C. Acock posted on Sunday, August 29, 2004 - 3:19 pm
On page 308 of the User's Guide to Version 3.0 it says "Multiple data sets generated using multiple imputation (Schafer, 1997) can be analyzed using a special feature of Mplus."

What is this special feature?

Alan Acock
 Linda K. Muthen posted on Sunday, August 29, 2004 - 4:03 pm
See Example 12.13 and the IMPUTATION option of the DATA command.
 Chris Richardson posted on Friday, November 05, 2004 - 11:37 pm
Hi Linda/Bengt,
I'm using MPlus V3 to conduct several EFA's and CFA's on 2 datasets. The variables are all ordinal (4pt likert) and contain from 3 to 10 % missing data. When running the EFA's while treating the data as categorical, I included the line TYPE = MISSING -- Question 1) I was wondering what method MPlus was employing to deal with the missing data in this situation?

As for the CFA's, I have been using NORM to create multiple imputed data sets which I then use in MPlus via the IMPUTATION option - this works fine. Question 2) Does this approach seem reasonable or is there an easier way to deal with the missing data without using NORM?

Thanks for your time - cheers
chris
 bmuthen posted on Saturday, November 06, 2004 - 5:33 am
With EFA and categorical variables, least-squares estimation is used and missing data is simply handled by what amounts to pairwise present data.

With CFA and categorical you may also use maximum-likelihood - at least if you don't have too many factors so the numerical integration is feasible - and then the usual approach of ML under MAR assumptions is used. But using the Imputation approach you mention should be fine.
 Chris Richardson posted on Sunday, November 07, 2004 - 5:23 pm
Thanks Bengt!

cheers
chris
 Anonymous posted on Friday, December 31, 2004 - 7:23 am
Hi. Is there a conflict between multiple imputation analysis and categorical variables declared in the VARIABLES section of the code?

A model with categorical items runs without error in each of the individual imputed datasets, but none converge when using the TYPE=IMPUTATION command to run them all at once.

Thanks!
 Shige Song posted on Friday, December 31, 2004 - 8:24 am
Also, Stata has a set of user contributed routines to generate multiply imputed data set. Try "findit imputation" in the command prompt.
 Linda K. Muthen posted on Friday, December 31, 2004 - 10:32 am
There should be no conflict between multiple imputation and categorical variables. To look into this I would need two imputed data sets, the two outputs that show that these data sets worked individually, and the output that shows the problem you encountered with multiple imputation.
 Anonymous posted on Tuesday, March 15, 2005 - 7:20 am
Hello,

I am trying to do a multilevel CFA with imputed data. I imputed the data by NORM and created an ASCII file containing the names of the 5 data sets as you described in the Mplus User’s guide. I specified a model with 3 factors both on the within and on the between level.

Now, I´ve got 2 questions:
1. In the output is mentioned that the number of replications “requested” is 5, whereas the number of replications “completed” is 1 or 3 (depending on the specific model). What does this mean? And why is the number of repclications completed not also 5?
2. The program tells me that the output option “standardized” is not available for Montecarlo. Is that right? How can I get standardized parameter estimates (factor loadings) with imputed data?

I am looking forward to your reply. Thank you very much in advance!
 Linda K. Muthen posted on Tuesday, March 15, 2005 - 8:49 am
For some reason, the model did not converge for all five of your data sets. You could run each data set separately to see if that gives you more information about why there were convergence problems. Standardized estimates are not available with Monte Carlo which is what our imputation uses. You would need to compute the standardized estimates by hand.
 Patrick Malone posted on Thursday, April 14, 2005 - 1:22 pm
Good afternoon.

I'm working with imputed data, on a project that I started in the v2 days, when I combined estimates in an external program. Rubin's rules, which I assume Mplus is using to get the SEs of parameter estimates, also usually give a degrees of freedom by which to evaluate the Est/SE on the t distribution. Is there a way to get that information from Mplus?

Thanks,
Pat
 BMuthen posted on Friday, April 15, 2005 - 1:53 am
We don't provide that currently, but I would think that the t distribution is well enough approximated by a normal distribution in most cases.
 Patrick Malone posted on Friday, April 15, 2005 - 4:42 am
Thanks, Bengt. Just as an addendum, I've also heard secondhand that Paul Allison recommends using the df as an index of the adequacy of the number of imputations, so this would be quite useful information in a future release.

Thanks,
Pat
 Linda K. Muthen posted on Saturday, April 16, 2005 - 4:32 am
If you send me an email suggesting this to support@statmodel.com, I will add it to our list of future additions when I return.
 Andrew Percy posted on Tuesday, April 26, 2005 - 3:30 pm
Hi

I am trying to run a basic 5 class LCA model (four nominal indicators each with 3 categories) with imputed data sets. When the model is run on a single data set the output is fine. But when the model is run on imputed data I get a output warning (copied below) and no model results. Do I need to use different output comands for imputed data?

Thanks for your help.


OUTPUT:

TECH1 TECH8;

*** WARNING in Output command
TECH1 option is the default for MONTECARLO.
*** WARNING in Output command
SAMPSTAT option is not available when outcomes are censored, unordered
categorical (nominal), or count variables. Request for SAMPSTAT is
ignored.
2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
 Linda K. Muthen posted on Tuesday, April 26, 2005 - 3:47 pm
Analyzing imputed data sets uses external Monte Carlo so that is why you get the warning about TECH1. You should get the other warning even if you are running only one data set. It sounds like you need to send input/output, data, and your license number to support@statmodel.com.
 Peter Martin posted on Friday, October 14, 2005 - 1:57 am
Hello there,

Is it possible to set up a multi-group path analysis with imputed data in Mplus?

I have two groups. The data are contained in 5 imputed data sets for each group - so there are 10 datasets altogether.

I have tried the "individual data, different data sets" method of specifying the groups (as described in Ch. 13 of the User Guide), listing two files that each contain the names of five imputed data sets. That didn't work, though - Mplus returned the error message: "There are fewer NOBSERVATION entries than groups in the analysis." (My sample sizes are above 5000 in each of the imputed data sets.)

Should I combine the two groups so that there are only 5 datasets in total (each containing data from both groups - this would be the "individual data, one data set" method) - or is there another way?

As always, I'm grateful for this brilliant discussion site,

Peter
 Peter Martin posted on Friday, October 14, 2005 - 2:37 am
... actually, I've another question about multiple imputation and path analysis (not necessarily multigroup this time).

Can you get R-squares for the dependent variables, like you would for a path analysis without imputation?
 Linda K. Muthen posted on Friday, October 14, 2005 - 9:49 am
For multiple group analysis with imputation, the data for both groups needs to be in one data set with a grouping variable included.

We don't give r-square with multiple imputation because the output is based on our Monte Carlo output. You would need to compute this by hand.
 fati posted on Friday, October 14, 2005 - 11:47 am
I am doing an LCA with missing data, but I am not sure to understand well.

1- then if a have a missing data pattern, I can create a file with type=missing basic and define a pattern variable and use the result file in the second step the analysis, that's correct? but have other missing value i can define it in the second analysis, type=mixture missing and make missing=all (999) , that ok?

2- what is the MCAR test, and how can i obtain it with mixture modeling?

3- I have a question about the preceding message posted by Anonymous on Monday, April 26, 2004 - 9:54 pm , he suggest that i can do the path analysis without mixture, with type=missing and integration=montecarlo and also doing a model with type=mixture missing, and I must have the same result, wich result must be the same? and what is the reason for doing this?


4- another question is what is the maximum percentage of missing data that is acceptable for doing an multiple imputation?

thank you very much for your response,
 bmuthen posted on Saturday, October 15, 2005 - 6:03 am
1. If you have missing data you should do two things in one and the same run. First, you should define what your missing data symbol is by using MISSING = all (999), say, in the VARIABLE command. Second, you should use TYPE = MISSING in the ANALYSIS command which gives you the so called MAR approach of ML estimation.

2. MCAR testing is testing that the data are missing completely at random. MCAR is not a necessary condition given that you can use the less restrictive MAR assumption, so you should have very specific reasons for wanting to know about MCAR. I would want a strong majority of the information to come from the data, not the model.

3. I don't know what your model is (the old post doesn't tell me that). Perhaps you refer to the fact that TYPE = MIXTURE MISSING with a single class gives the same results as TYPE = MISSING.

4. That's difficult to say and is also related to how non-random the missingness is. You should read Joe Schafer's book (see the Mplus web site) where he describes ways to quantify effects of degrees of missingness in terms of the uncertainty it brings to the estimation. The more missingness you have, the more your results rely on your model instead of your data, which is not good.
 fati posted on Thursday, October 20, 2005 - 6:59 am
1. i know that i must do this for missing data, but my question is if i have missind data pattern (see the last message)?
2. ok
3.my model is LCA with 25 categorical variables, the last message that i have see is:

Anonymous posted on Sunday, April 25, 2004 - 6:36 pm

but, consider also modeling your missing data variable, if your variable is categorical you can model it with a mixture model, you can also do this with ordered/unordered categorical without mixture

the message that contains the question is :

Shige posted on Saturday, April 24, 2004 - 3:23 pm

Dear All,

I am trying to do a SEM survival model where some of the covariates in the measurement model have some heavily missing data (20%). Multiple imputation seems to be the best choice in this case.

Based on my reading of the Mplus 3 user guide, Mplus does not have the facility to carry out multiple imputation, but it can process imputed data (example 12.13). In that case, can anybody share their experience about which multiple imputation software to use to work with Mplus? I know there is large body of literature of multipe imputation, I am a little lost...

thank you
 bmuthen posted on Thursday, October 20, 2005 - 6:56 pm
That earlier post was referring to missing data in covariates, not in the outcomes. You don't need multiple imputations for outcomes, but missingness on outcomes is taken care of by ML under MAR, that is Type = Missing.
 Anonymous posted on Wednesday, November 02, 2005 - 5:59 pm
Hi.

I would like to know how the summary of the chi-square statistcs are calculated in "TYPE IS IMPUTATION". Is the "mean" simply the unweighted "mean" of the chi-square statistics?

Thanks!
 Linda K. Muthen posted on Wednesday, November 02, 2005 - 6:35 pm
We don't give a chi-square for multiple imputation because we are not clear on the theory for this.
 Anonymous posted on Wednesday, November 02, 2005 - 7:39 pm
So, I shouldn't interpret the "mean" of the chi-square, right? Below is the summary of running the multiple imputation with 10 impulations.

TESTS OF MODEL FIT

Number of Free Parameters 9

Chi-Square Test of Model Fit

Degrees of freedom 18

Mean 52.313
Std Dev 5.759
Number of successful computations 10

Proportions Percentiles
Expected Observed Expected Observed
0.990 1.000 7.015 42.415
0.980 1.000 7.906 42.415
0.950 1.000 9.390 42.415
0.900 1.000 10.865 42.415
0.800 1.000 12.857 42.415
0.700 1.000 14.440 49.892
0.500 1.000 17.338 51.864
0.300 1.000 20.601 54.372
0.200 1.000 22.760 54.558
0.100 1.000 25.989 56.637
0.050 1.000 28.869 56.637
0.020 1.000 32.346 56.637
0.010 1.000 34.805 56.637
 Linda K. Muthen posted on Thursday, November 03, 2005 - 6:01 am
Are you saying TYPE=IMPUTATION; or TYPE=MONTECARLO; in your DATA command?
 Linda K. Muthen posted on Thursday, November 03, 2005 - 4:01 pm
I finally found a multiple imputation example and you are correct that we do print the mean of chi-square. It is simply an unweighted mean. It has not been adjusted in any way because we are not clear on the theory for this.
 S. Oesterle posted on Friday, January 20, 2006 - 3:20 pm
I am using TYPE=IMPUTATION to analyze 20 data sets that I have created via multiple imputation in NORM. I am estimating a path model with observed variables only. The output in Mplus does not give me standardized estimates. I know that you said in an earlier response that "Standardized estimates are not available with Monte Carlo which is what our imputation uses. You would need to compute the standardized estimates by hand."

How do I calculate the standardized estimates, particularly when my dependent variables are binary? I was going to use the following formula: take the ratio of the standard deviation of x to that of y and multiply it by the unstandardized esimate. However, the sample statistics printed in the output do not include variances for categorical variables and, besides, are only printed for the first data set. Is there any way to get the sample statistics averaged across all data sets, just like the fit statistics and the regression estimates? Or is there any other way to calculate the standardized estimate?
 bmuthen posted on Friday, January 20, 2006 - 5:53 pm
For a binary dependent variable u, Mplus uses a standardization of the slope that divides by the standard deviation of u*, a latent response variable underlying u (drawing on the u* variance is also used in the R-square for binary outcomes of McKelvey & Zavoina, 1975 which is referred to in some text books). In a regular probit regression of u on x, the variance of u* given x, i.e. the residual variance, is fixed at 1. With logit regression, the residual variance is fixed at sqrt(pi**2/3). So the standard deviation is the sqrt of the sum of the variance in u* explained by x plus this residual variance.
 S. Oesterle posted on Monday, January 23, 2006 - 4:04 pm
I am estimating a multiple group analysis (2 groups) for a path model using TYPE=IMPUTATION with 20 imputed datasets. In the model where all parameters are estimated freely across the 2 groups, I do not get any error messages and the estimation terminated normally. However, when I look at the coefficients for the second group, most (but not all) estimates and standard errors are zero. The estimates for the first group look ok. When I estimate the model separately for the 2 groups, I get correct results. What could be going on here?
 Linda K. Muthen posted on Monday, January 23, 2006 - 4:39 pm
If you are using a version earlier than Version 3.12, you should download the most recent version of Mplus. If you still have the problem, send the input, two data sets, output, and your license number to support@statmodel.com.
 S. Oesterle posted on Monday, January 23, 2006 - 5:06 pm
Installing version 3.13 did not solve the problem. I will send you my files. Thanks!
 Scott Grey posted on Wednesday, February 01, 2006 - 3:11 pm
Hello!
I have been attempting to conduct a multilevel growth curve analysis “TYPE IS TWOLEVEL” with missing data using the multiple imputation feature as there are a number of covariates with missing data in our dataset. Mplus appears to replicate the analysis in the DOS window, but when the DOS window closes there is no output in the GUI window. An output file is generated, but it always ends at “Input data file(s)” The program has no problem imputating other analysis like “TYPE IS COMPLEX.” Here is the code:

DATA:
FILE IS "C:\Documents and Settings\insthealthsa4\
My Documents\DARE\Imputation\AUGUSTINE_m\AUG_imp.txt";
TYPE IS IMP;

VARIABLE:
NAMES ARE crsswlk hs9dist region1 region2 region3
region4 Urban stressms rhsfree MSfrelun MSredlun
MSwhite MSBlack MSLatino MSAsian MSother treatms
att7rev att9rev utq41a utq41b utq41c utq41d n2q51a
n2q51b n2q51c n2q51d upq40c n1q44c_7 n1q44c_8 AGE
sex family catuse1 catuse2 catuse3 catuse4 catuse5
catuse6 upq37a upq37b upq37c upq37d upq37e upq37f
upq37g upq37h n2q46a n2q46b n2q46c n2q46d n2q46e
n1q7g n1q7j n2q7g n2q7j eq10g eq10j tq10c tq10f
q10c q10f smoket1 drinkt1 pot1 smoket2 drinkt2
pot2 smoket3 drinkt3 pot3 smoket4 drinkt4 pot4
smoket5 drinkt5 pot5 latino black asian am_ind
oth_race clu1 clu2 clu3 clu4 clu5 clu6 clu7 clu8
clu9 clu10 clu11 clu12 clu13 clu14 clu15 clu16
clu17 clu18 clu19;

USEVARIABLES ARE urban rhsfree n2q51a n2q51b n2q51c
n2q51d upq40c n1q44c_7 n1q44c_8 age sex
latino black oth_race
anysmoke anydrink anypot lowhi hilow hihi
mattpol7 mattpol8 mattpol9;

WITHIN IS upq40c n1q44c_7 n1q44c_8 age sex
anysmoke anydrink anypot lowhi hilow hihi
latino black oth_race mattpol7 mattpol8 mattpol9;

BETWEEN IS rhsfree urban;

CLUSTER IS hs9dist;

DEFINE:
mattpol7 = (tq10c+tq10f)/2;
mattpol8 = (eq10g+eq10j)/2;
mattpol9 = (n2q7g+n2q7j)/2;
devian7 = (upq37a+upq37b+upq37d+upq37g)/4;
devian9 = (n2q46e+n2q46a+n2q46b+n2q46c)/4;

lowhi = 0;
hilow = 0;
hihi = 0;
anysmoke = 0;
anydrink = 0;
anypot = 0;
IF (devian7 LE 1 AND devian9 GT 1) THEN lowhi = 1;
IF (devian7 GT 1 AND devian9 LE 1) THEN hilow = 1;
IF (devian7 GT 1 AND devian9 GT 1) THEN hihi = 1;
IF (smoket1 GT 0 OR smoket2 GT 0 OR smoket3 GT 0 OR
smoket4 GT 0 OR smoket5 GT 0) THEN anysmoke = 1;
IF (drinkt1 GT 0 OR drinkt2 GT 0 OR drinkt3 GT 0 OR
drinkt4 GT 0 OR drinkt5 GT 0) THEN anydrink = 1;
IF (pot1 GT 0 OR pot2 GT 0 OR pot3 GT 0 OR
pot4 GT 0 OR pot5 GT 0) THEN anypot = 1;

ANALYSIS:
TYPE IS TWOLEVEL;
ESTIMATOR = ML;
ITERATIONS = 1000;
CONVERGENCE = 0.00005;

MODEL:
%WITHIN%
attinstb BY n2q51a n2q51b n2q51c n2q51d;
i s | mattpol7@0 mattpol8@1 mattpol9@2;
mattpol7 ON upq40c;
mattpol8 ON n1q44c_7;
mattpol9 ON n1q44c_8;
attinstb ON i s age sex lowhi hilow hihi
anysmoke anydrink anypot latino
black oth_race;

%BETWEEN%
attinstw BY n2q51a n2q51b n2q51c n2q51d;
n2q51a-n2q51d@0;
attinstw ON rhsfree urban;

OUTPUT: TECH1 TECH8;


THANKS FOR YOUR HELP!!
 Linda K. Muthen posted on Wednesday, February 01, 2006 - 3:41 pm
Please send your input, data, and license number to support@statmodel.com. Looking at the input alone cannot tell me what happened.
 Peter Martin posted on Thursday, May 18, 2006 - 8:25 am
Hello,

I am estimating a latent class model with type=imputation. There are 5 latent class indicators (Y), 4 of which are categorical, while 1 is nominal. I have also got covariates (X) that relate to the latent variable (C), and some direct effects from covariates to some of the Ys.

The estimation runs fine; but the output reports values of .000 for all estimates associated with the nominal Y - that is, both for the means of the nominal Y associated with each latent class, and for the direct effect of one of the X-variables on the nominal Y. In contrast, estimates associated with categorical Y are given and make sense.

The same problem does not occur when I run the model on just one data set (that is, without "type=imputation"). Neither does the problem occur when I specify my nominal Y as categorical, and use "type=imputation".

What could it be that goes wrong when estimating with imputed datasets and a nominal latent class indicator?
 Linda K. Muthen posted on Thursday, May 18, 2006 - 9:47 am
This sounds like a problem that has been fixed in Version 4.1. If you download Version 4.1, you should be fine. If not, send your input, data, output, and license number to support@statmodel.com.
 Peter Martin posted on Tuesday, May 23, 2006 - 1:54 am
Yes, using version 4.1 resolved the problem. Thank you, Linda!
 Thomas Rodebaugh posted on Thursday, June 15, 2006 - 9:34 am
when using multiple imputation and regressing latent variables upon other latent variables, is it sufficient to set the all the latent variables' variances to 1 to get standardized values for these regression coefficients? or is it necessary to hand calculate these, too?

i feel like this should be a simple question, but i have a very hazy grasp of how standardization works, exactly.

thanks in advance for any help.
 Linda K. Muthen posted on Thursday, June 15, 2006 - 11:02 am
If you fix the metric of the factors by fixing the factor variances to one instead of a factor loading, you would receive estimates equivalent to the Std standardization of Mplus. The two standardizations used in Mplus are described in Chapter 11 of the Mplus User's Guide where the general output is described.
 Thomas Rodebaugh posted on Thursday, June 15, 2006 - 3:03 pm
thanks for that reply. now i'm running into another issue.

i'm using multiple imputation and specifying the MLM estimator because of some evidence that the multivariate distribution is not normal. now i would like to make some model constraints and test these via chi-square difference tests.

of course, when using MLM one cannot simply subtract the chi-squares--one needs the scaling correction factors.

however. . . the output when using MI does not provide scaling correction factors (that i can find). i tried using the difftest option that works for WLSMV, and it informed me that this only works for WLSMV. so. . . is there something i am missing that would allow me to test model constrains using MI and the MLM estimator?

thanks,

tom
 Thomas Rodebaugh posted on Friday, June 16, 2006 - 6:34 am
. . . i realized after i wrote this that i could, of course, calculate the difference tests for each of the 5 MI data sets separately. is that the only way to go about this?
 Linda K. Muthen posted on Friday, June 16, 2006 - 9:45 am
With multiple imputation, we give the average of the fits statistics like chi-square. I don't think there is any theory on how to actually calcluate chi-square for multiple imputation. Because of this, I don't know how you would do difference testing in this situation.
 chennel huang posted on Monday, June 19, 2006 - 1:00 am
hello, i'm a student in TW.

When I try to operate MI in NORM, 11 of 18 variables are ordinal scale vars. How do i observe and decide the method to transform these vars to fit the assumption of normality? Sorry for my question in wrong place.
 chennel huang posted on Monday, June 19, 2006 - 2:00 am
Following the previous question, for the ordinal vars, should I choose the "logit transformation" with limited range to make the transfomed valus reasonable, and choose the "to the nearest observed value"?
 Linda K. Muthen posted on Monday, June 19, 2006 - 8:06 am
I'm not familiar with NORM. I would not transform ordinal variables because the numbers assigned to the categories do not represent numerical values.
 chennel huang posted on Wednesday, June 21, 2006 - 5:19 am
After using NORM to make 5 datasets, I use Mplus vesion 3.0 to read these datasets. However, the output is"***ERROR(Err#: 29)Error opening file: mi.dat".
My syntax is in the following.

data:
file is mi.dat;
type is imputation;
variable:
names are cl edu inc gen bmi year at1-at7 ba1-ba4;
usevariable are at1-at7;
analysis:
type is general basic;

These datasets are named under mi1, mi2 mi3, mi4, and mi5. They are saved in the same folder with the syntax.

Thanks for your help.
 Linda K. Muthen posted on Wednesday, June 21, 2006 - 6:28 am
The error means the file cannot be found. Check if the extension dat was added twice to the data set. Otherwise, send the input, data sets, and your license number to support@statmodel.com.
 Yung Chung posted on Thursday, June 29, 2006 - 2:30 am
Hi, this is with specific reference to Thomas Rodebaugh's post on Thursday, June 15, 2006 - 3:03 pm:

There is a SAS Macro that calculate the "weighted" average (each statistic overstates the strength of the evidence against the null hypothesis because
it ignores missing-data uncertainty) of chi squares obtained over imputated dataset. The syntax can be found on Paul Allison's website.

Hope this helps
 Susan Scott posted on Friday, October 13, 2006 - 11:41 am
Hi,

I would like to know where I can find information on how TYPE=IMPUTATION analyses are run.

When I do analyses on multiply-imputed datasets in SAS, the model is run on each of the datasets and then the estimates are combined (using proc mianalyse). However, I am finding that when I have an SEM model that has converged with TYPE=IMPUTATION and I try to run the same model on the individual datasets, I often get the message that the model does not converge. I would like to understand why this is happening.

Thank you,
Susan Scott
 Linda K. Muthen posted on Friday, October 13, 2006 - 2:42 pm
I cannot answer this question without seeing the input, data sets, output, and your license number at support@statmodel.com. Please send the output with TYPE=IMPUTATION; where all data sets converged and also an output where using an individual data set did not converge.
 Susan Scott posted on Friday, October 20, 2006 - 2:08 pm
I have e-mailed everything. I'm not sure if it hasn't registered in my e-mail program because I sent it from here, or if it was not sent for some reason. If you do not received anything, please let me know.

Thank you,
Susan Scott
 Linda K. Muthen posted on Friday, October 20, 2006 - 4:21 pm
I have not yet received anything.
 Rick Sawatzky posted on Wednesday, October 25, 2006 - 2:30 pm
Hi Linda and Bengt,

I created five imputation datasets to be used for a CFA based on MI. The model converges fine for each dataset individually, but when I combine the datasets in the analysis using the TYPE = IMPUTATION command and a separate input I get the following message:
Number of replications
Requested 5
Completed 1
Then when I change the order of the dataset in the input file I obtain 4 successful replications. I am unclear about what the cause of this discrepancy might be. Do you have any suggestions? I pasted the syntax for a very much simplied model in which I find the same problem.

Thank you,

Rick Sawatzky.

SYNTAX
DATA: file is mi.txt;
TYPE = IMPUTATION;
ANALYSIS: ESTIMATOR = WLSMV;
VARIABLE: NAMES ARE y1-y7;
USEVARIABLES ARE y1-y7;
CATEGORICAL ARE y1-y7;
MODEL: f BY y1-y7;
 Rick Sawatzky posted on Wednesday, October 25, 2006 - 2:38 pm
Just to clarify my previous posting, the CFA of the imputed data files runs fine when I do not specify categorical data (i.e., the problem only occurs when the variables are specified as categorical).
 Linda K. Muthen posted on Wednesday, October 25, 2006 - 3:16 pm
Please send the input, data sets, output, and your license number to support@statmodel.com. We need this information to understand what is happening.
 Rick Sawatzky posted on Thursday, October 26, 2006 - 4:22 pm
Linda,

Thanks so much. The above problem is now solved. However, when I use the WLSMV estimator I find that the output file does not provide a mean chi-square statistics. I assume that this might be because the estimated degrees of freedom for WLSMV might not be identical for the different MI dataset. Is this explanation correct or is there another reason why the mean chi-square is not provided (it is provided when I use WLSM estimation for the same model).
Thanks again,

Rick.
 Linda K. Muthen posted on Thursday, October 26, 2006 - 4:32 pm
The mean chi-square would not be meaningful for WLSMV for the same reason the chi-square values cannot be used for difference testing -- only the p-value is meaningful.
 Bruce A. Cooper posted on Tuesday, March 13, 2007 - 12:50 pm
Hi -
I want to use TYPE=IMPUTATION to do (say) a two-step hierarchical regression analysis and test the difference between the two models.

(1) Can I use the difference between the mean -2Loglikelihood statistics for the two models to test for the improvement in fit from the second set of variables added in the larger model?

(2) There is no test for model fit (all values are 0). I assume I can use the mean -2LL and df to test each model?

(3) How can I get the corrected df and p-values for the t-statistics reported by the program?

(4) How can I get the relative efficiency and fraction of missing information for the intercept and predictors?

Thanks,
bac
 Linda K. Muthen posted on Saturday, March 17, 2007 - 9:21 am
1 and 2. No. The difference between two average loglikelihoods is not distributed chi-square.

3. The standard errors are correct so the ratio gives a correct z-score.

4. This is not provided. You would have to see the Schafer text to see how to do this.
 Bruce A. Cooper posted on Sunday, March 18, 2007 - 9:26 am
Thanks, Linda -

I must have a basic misunderstanding about the Deviance Chi-squared, or wonder why it would not be relevant to 1&2 above. In other maximum likelihood-based models, -2LL is distributed as Chi-squared with the df = number of parameters in the model. So, it seems like the "mean -2LL" would also be distributed Chi-squared. Also, the difference between the -2LL for one model compared to the -2LL for a nested model is called the Deviance, and is distributed as Chi-squared with the df = the difference in the # of parameters estimated by the two models. This allows the test for the difference between hierarchical models with, for example, logistic regression and multilevel regression. I don't understand why the same would not be true in the case of hierarchical linear regression, estimated with maximum likelihood in this case.Could you help me with a reference so I could learn why the "mean Deviance" would not also be distributed Chi-squared?
Thanks,
bac
 Linda K. Muthen posted on Monday, March 19, 2007 - 10:21 am
We have never seen an article discussing whether the loglikelihood averaged over imputed data sets is distributed as chi-square. It may be and whether it is may also be a function of the number of imputed data set. If you know of a reference that supports this, please let us know.
 Bruce A. Cooper posted on Monday, March 19, 2007 - 5:59 pm
Thanks, Linda -

Here are some references that you may find useful.

I haven't gotten the Statistics in Medicine reference yet, but Don Rubin referred to it and two others in the "IMPUTE" thread "IMPUTE: Re: "Averaging" chi-square values (fwd)" as providing information about averaging Chi-squared values from SEM models on imputed data sets. There are some other notes in the IMPUTE threads re averaging R-squared values, but you already report R-squared for the imputation analysis. It would be nice to have the DF for the t-tests and the and an option for testing the Deviance, too!

Thread: http://www.mail-archive.com/impute@utdallas.edu/msg00158.html

References:
Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F-reference distribution. Journal of the American Statistical Association, 86(416), 1065-1073.

Rubin, D. B., & Meng, X. L. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79(1), 103-111.

Rubin, D. B., & Schenker, N. (1991). Multiple imputation in health-care databases: an overview and some applications. Stat Med, 10(4), 585-598. (PMID: 2057657)
 Linda K. Muthen posted on Monday, March 19, 2007 - 6:47 pm
Thanks for the references. We'll take a look at them.
 Bengt O. Muthen posted on Friday, March 23, 2007 - 2:07 pm
Looks like some of these references are useful in terms of model testing in future versions of Mplus.
 Anonymous posted on Friday, May 04, 2007 - 12:16 pm
Good aftenoon, I have quick question: How does MPlus identify the datasets used in the imputation process? Is there a way to specify the names of the data sets or does the program search for some identifier? Thanks.
 Linda K. Muthen posted on Friday, May 04, 2007 - 12:32 pm
There is no identifier. The data set names are listed in a file and the file is accessed. See Example 12.13.
 Bruce A. Cooper posted on Sunday, May 06, 2007 - 4:26 pm
Hello -

I have found what might be a bug in your MI procedure, or at least a documentation problem. I have run a linear regression using 10 MI data sets produced in SAS. In one analysis, I used the names in the SAS file -- 7 of which have more than 8 characters. Mplus warns that the 7 names contain more than 8 characters and that only the 1st 8 will be used in the output, and gives the offending names. When I run the same inp file after correcting the names to be only 8 characters, I get different results. There are no other differences in the two analyses. The run with the short names corresponds closely to the SAS output. Here are the reduced outputs:

Run with 7 long varnames:
STAIX1 ON
CCMC 1.635 1.866
CCFC -3.035 1.408
CCMXCCF 0.340 1.505

Run with short names:
STAIX1 ON
CCMC -1.236 2.644
CCFC -2.632 2.836
CCMXCCF 0.077 1.439

Thanks,
bac
 Linda K. Muthen posted on Sunday, May 06, 2007 - 6:06 pm
I would need to see the input, data sets, output, and license number at support@statmodel.com. If you are not using Version 4.21, I suggest you run the analyses with Version 4.21 as a first step.
 Anonymous posted on Monday, May 07, 2007 - 5:17 am
In reference to response Linda K. Muthen posted on Friday, May 04, 2007 - 12:32 pm, I am seeking further clarification of how Mplus delineates the datasets. Thank you for your referral to the MPLUS User's guide. In example 12.13, the following is stated: "Each record of the file must contain one data set name. For
example, if five data sets are being analyzed, the contents of impute.dat
would be:
data1.dat
data2.dat
data3.dat
data4.dat
data5.dat
where data1.dat, data2.dat, data3.dat, data4.dat, and data5.dat are the
names of the five data sets created using multiple imputation."

After reviewing this example, my questions are:
Are the imputed datasets to be analyzed identified by their possession of the suffix .dat (or any other Ascii file format, .txt, .csv etc.)? Does the program actively search for a variable containing this data? If so, is it correct to say that to analyze imputed data, one must create a character variable that distinguishes each data set, when preparing the data for analysis using MPLUS? I'm sorry to be a bother, but I'd like to understand how Mplus makes this distinction. Thanks again for your aid.
 Linda K. Muthen posted on Monday, May 07, 2007 - 7:11 am
The program looks for the file impute.dat. It then reads the data sets in the order found in the list of data set names. There is no required extension for the names of the data sets.
 Anonymous posted on Monday, May 07, 2007 - 12:29 pm
Okay. I think I understand now. The file command tells the program to search for other files whose names are listed in the impute file. The analysis is the based on results combined across the data sets. I was assuming the data sets were all combined into one larger file.
 Linda K. Muthen posted on Monday, May 07, 2007 - 5:16 pm
For Bruce Cooper:

The order of the variable names is different in the two inputs:

arsmamso arsmamao msosc maosc msxma ccmc ccfc ccmxccf

arsmamsos arsmamaos msosc maosc ccmc ccfc msxma ccmxccf
 E. Christopher Lloyd posted on Tuesday, May 22, 2007 - 12:38 pm
I have several questions regarding the output when type=imputation is used.

1. If a replication results in warnings (such as a warning about a singular matrix), is that replication's results still included in the output?

2. I cannot seem to find what the expected and observered "proportions" and "percentiles" columns mean in the tests of model fit section. Can you refer me to a page in the User's Guide or briefly explain it? I need enough detail to be able to address its meaning if a Committee member should ask.

Thank you!

Chris Lloyd
 Linda K. Muthen posted on Tuesday, May 22, 2007 - 1:10 pm
1. Only runs that did not converge are not included in the results.

2. These are explained in Chapter 11 under Monte Carlo Output.
 Kantahyanee Murray posted on Wednesday, September 19, 2007 - 11:45 am
Hello,
I am using Mplus to perform SEM on a MI dataset for my dissertation. Any new information about how to obtain a chi-square representing the combined datasets?

In another post, you made reference to the p-value of a chi-square being valid. Does this mean that the average p-value is valid for determining statistical significance?

Thank you.
K.Murray
 Linda K. Muthen posted on Thursday, September 20, 2007 - 11:15 am
I don't think the average p-value would be valid.
 Kantahyanee Murray posted on Wednesday, September 26, 2007 - 4:20 pm
Thank you.
 Alison Riddle posted on Friday, March 14, 2008 - 9:57 am
Hi Linda and Bengt,

I have 10 imputed data sets created in Amelia with which I would like to do a complex EFA (weighted and clustered) with categorical data and a linear regression using the results of the EFA in a CFA framework and then regressing them on to categorical outcomes and covariates. I am using Mplus v.5.

Can I use IMPUTATION with COMPLEX? Are there any other issues that I should be concerned about?

Thanks for your assistance.

Cheers,
Alison
 Alison Riddle posted on Friday, March 14, 2008 - 10:00 am
Hello again,

I just checked the User's Guide and it appears that I cannot use IMPUTATION with EFA - is that correct?

Cheers,
Alison
 Linda K. Muthen posted on Friday, March 14, 2008 - 10:22 am
You can use IMPUTATION with COMPLEX EFA with the PROMAX or VARIMAX rotations but not with the other rotations.
 Allison Holmes Tarkow posted on Sunday, April 27, 2008 - 6:51 pm
I'm using multiple imputation and am doing a latent growth curve model. I would like to do a multiple group analysis.
In a posting above it was stated that it is possible to use multiple imputation for multiple group analyses.

To clarify, is this true when the multiple imputation is performed on the whole sample as opposed to separate imputation analyses for the groups of interests (e.g., boys and girls)?

This is probably simple, but I'm having trouble wrapping my head around how it would work to compare 2 groups (that you are testing to be non-equivalent) if the complete dataset is constructed based on the assumption that the missing data patterns are generated by one sample.

Thanks for your help!
 Linda K. Muthen posted on Monday, April 28, 2008 - 9:04 am
I would think you would base the imputation on the full sample unless missing data patterns vary for males and females. I would suggest seeing what the imputation literature says however as I have no evidence to support this opinion.
 Chun-Ju Chen posted on Sunday, June 01, 2008 - 6:46 am
Hi,
Can Mplus work the EFA when the TYPE=IMPUTATION? or...it can only work when the analysis like example12.13 ?
(our variables are ordinal.)

Sorry for disturbing again. Our institution wants to update our Mplus, before that, we have to make sure about the EFA can do the work when TYPE=IMPUTATION.
Thanks for your help.
 Linda K. Muthen posted on Monday, June 02, 2008 - 9:46 am
TYPE=EFA and the IMPUTATION option cannot be used in combination.
 Daniel Oberski posted on Monday, June 09, 2008 - 7:11 am
Dear Drs Muthen

I have 20 multiply imputed correlation matrices, but not the imputed cases from which they were computed.

Can I use TYPE = IMPUTATION to estimate a model in Mplus from these 20 correlation matrices? Or does this option only work with separate cases?

Thank you very much in advance for you help,

Daniel
 Linda K. Muthen posted on Monday, June 09, 2008 - 8:10 am
TYPE=IMPUTATION requires raw data. It cannot be used with summary data.
 Andrea Dalton posted on Tuesday, July 22, 2008 - 3:16 pm
Hi,

My question is related to the post " Anonymous posted on Tuesday, March 15, 2005 - 7:20 am" and reply.

I am running an analysis using imputed data sets, and the output indicates that the number of replications "completed" is only 2, when I had 5 data sets originally.

I ran each set separately and there were no errors (i.e., "model estimation terminated normally"). So, why is it that I don't get all 5 sets included in the analysis?
 Linda K. Muthen posted on Tuesday, July 22, 2008 - 3:17 pm
Which version of Mplus are you using?
 Andrea Dalton posted on Tuesday, July 22, 2008 - 3:22 pm
Not the newest update - I think it's version 4 (bought it almost a year ago).
 Linda K. Muthen posted on Tuesday, July 22, 2008 - 4:46 pm
You should download Version 5.1.
 Andrea Dalton posted on Wednesday, July 23, 2008 - 10:49 am
That worked! Thanks.
 aprile benner posted on Thursday, August 07, 2008 - 6:43 am
Good morning -

I am conducting a path analysis with 10 imputed datasets. Is there a way to run Model Constraint with Type = Imputation? I have tried and am not getting and error (but am also only getting truncated output).

Thanks,

aprile
 Linda K. Muthen posted on Thursday, August 07, 2008 - 7:00 am
MODEL CONSTRAINT is available with the IMPUTATION option. I was wrong about that. Please send your input, data, output, and license number to support@statmodel.com.
 Donna Ansara posted on Wednesday, November 05, 2008 - 4:00 pm
Hello,

I am running latent class regression analysis using Type=imputation and am able to run this perfectly fine. I am interested in presenting confidence intervals for the regression coefficients for the covariates and Mplus does not seem to provide this output when I specify the cinterval option. Would it be appropriate to calculate them using the standard errors that are indicated for the regression coefficents in the usual manner (i.e., estimate +/- 1.96SE)? Thank you for your assistance.
 Linda K. Muthen posted on Thursday, November 06, 2008 - 8:41 am
This would be correct.
 bob calie posted on Tuesday, May 12, 2009 - 8:21 am
Hi All,

I'm trying to impute missing data for a binary variable (say, gender: girl/boy). Since the data were collected from multiple schools and there are apparently distinct proportions of gender across schools, it seems that a 'stratified' imputation is more appropriate. Any ideas?

Thanks very much in advance.
 Linda K. Muthen posted on Tuesday, May 12, 2009 - 9:45 am
I don't know much about imputing data. I think you should pose this question to the developer of the software you would be using to impute the data.
 bob calie posted on Wednesday, May 13, 2009 - 1:24 pm
But Mplus can deal with missing data. What I was asking is if it's possible to impute missing with different probability across different clusters. This is not a research question and I was asking the developer of Mplus. Maybe I posted it in a wrong place or I shouldn't have used Mplus. Thanks anyway.
 Michael Spaeth posted on Thursday, May 14, 2009 - 12:25 am
If I were you I would ask Joseph Schafer or colleagues. He developed the freeware "Norm" which is a multiple imputation program. Mplus does not impute missing data. It handles missing data via a maximum likelihood approach (FIML).
 bob calie posted on Thursday, May 14, 2009 - 8:58 am
Thanks, Mike!
 Holmes Finch posted on Friday, June 05, 2009 - 5:23 am
Hi,

I am doing some simulations involving multiple imputation. I have imputed data for 100 replications using SAS and created 10 output datafiles for each replication. I would now like to use MPlus to conduct a LGCM for each of the 10 imputations for each of the 100 replications. I see how easy it is to read one set of 10 imputations in using the TYPE=IMPUTATION command, but I'm unsure of how to do this for my set of 100 replications, each with 10 imputations. Does this make sense? Thanks for any suggestions.

Holmes
 Linda K. Muthen posted on Friday, June 05, 2009 - 9:19 am
There is no way in Mplus to combine the ten imputation outputs. You would need to write a program to extract what you need from the output and combine it.
 Holmes Finch posted on Friday, June 05, 2009 - 9:28 am
Thanks, Linda, for the info.

Holmes
 Dorothee Durpoix posted on Sunday, June 07, 2009 - 2:13 pm
Dear Linda,

I've analyzed 13 imputed datasets, of which 6 are completed, the output said. However, when I run them individually, only 3 actually converge. A few earlier posts asked the same question, but there was no indication of what may have been the problem(s).
Could you please enlighten me of what is happening?

Cheers.
 Dorothee Durpoix posted on Sunday, June 07, 2009 - 2:17 pm
Just to complete my post: I'm using the 5.2 version.

Cheers.
 Linda K. Muthen posted on Sunday, June 07, 2009 - 3:05 pm
If you add TECH9 to the output command, it should show the problem. If this does not help, please send the problem and your license number to support@statmodel.com.
 Anonymouse posted on Tuesday, August 25, 2009 - 11:20 am
Hello,

I am testing a path model which is composed of a series of quantitative variables predicting two binary dependent variables. I am using multiple imputations to handle missing values on the x variables. When I try to use the MODEL INDIRECT command, I get a message indicating that Mplus cannot perform MODEL INDIRECT for multiple imputations.

Is there any way to work around this? It seems that I have to use multiple imputations for the missing x values because otherwise listwise deletion is used...
 Linda K. Muthen posted on Tuesday, August 25, 2009 - 11:35 am
You can use MODEL CONSTRAINT to define the indirect effects.
 Maren Winkler posted on Monday, September 21, 2009 - 2:36 am
Hi,

I have data from 1187 subjects on 135 variables. There is missing data on one variable (appr. 11 %) which is the only variable that is not categorical. I've done Multiple Imputation with NORM, getting 20 datafiles for further analyses.

I have used the WLSMV-estimator for my SEM. Mplus suggests to use NOCHISQUARE and NOSERROR to reduce computation time. I've done that but get the following messages: "NOCHISQUARE option is not available with multiple imputation.
Request for NOCHISQUARE is ignored.
NOSERROR option is not available with multiple imputation.
Request for NOSERROR is ignored."

Why then does Mplus suggest this option if I can't use it?

Moreover, I have a question concerning the output file when using multiple imputation:
For the tests of model fit, there are not only mean and SD for CFI, TLI etc., but also expected and observed proportions and percentiles - what does these results tell me?

Thanks for your help!
 Linda K. Muthen posted on Tuesday, September 22, 2009 - 10:26 am
The option is recommended in general but can't be used with imputation.

See page 330 of the user's guide for a description of the expected and observed proportions and percentiles.
 Maren Winkler posted on Monday, September 28, 2009 - 12:52 am
Thanks for your advice.

However, my output is a bit different from the example you give.

I'm running SEM with multiple imputation (5 computations). For the chi square test I only get the following output:

"TESTS OF MODEL FIT

Number of Free Parameters 297

Chi-Square Test of Model Fit

Number of successful computations 5

Proportions Percentiles
Expected Observed Expected Observed
..."

So I actually don't have Mean, Std Dev for chi-square. Is there a command needed to ask Mplus for chi square?

Moreover, for the other fit indices (CFI, TLI, RMSEA,...), the Std Dev is zero and hence, percentiles expected and observed are always the same. How do I read the proportions expected and observed then?

Thank you very much for your help!
 Linda K. Muthen posted on Monday, September 28, 2009 - 6:58 am
I would need to see your full output to understand what you are seeing. Please send it and your license number to support@statmodel.com.
 Moh Yin Chang posted on Wednesday, September 30, 2009 - 3:21 pm
Hi,

I read your example 12:13 but am still unsure how to set up the input data file. I tried to stack the imputed file in one data set and apparently it doesn't work. May I know how the dataset should look like for type=imputation?
Thanks
 Linda K. Muthen posted on Wednesday, September 30, 2009 - 3:59 pm
Each imputed data set should be in a separate file. The file specified using the FILE option should contain the names of the datasets. Please reread the example and also see page 424 of the user's guide.
 Moh Yin Chang posted on Thursday, October 01, 2009 - 9:21 am
Is there a way to perform model test with multiply-imputed data?
 Linda K. Muthen posted on Thursday, October 01, 2009 - 4:42 pm
MODEL TEST can be used with the multiple imputation. Please send your problem and your license number to support if you are having a problem.
 Charles B. Fleming posted on Wednesday, October 21, 2009 - 1:17 pm
I am running logit models with 40 imputations using the ML estimator and would like to see if model fit improves when I add a block of variables. In other words, I would like to assess the significance of the change in the loglikelihood relative to the change in number of estimated parameters. My question is whether the values for the loglikelihoods given when using multiple imputations can be used in a straightforward way (i.e., computing twice the difference in the loglikelihoods for the nested models) or do I need to apply some sort of correction as is described in your technical report: “Chi-square statistics with multiple imputation”. I am not clear if the output I am getting (using 5.21) already contains the correction to the mean loglikelihood value described in that report.
 Tihomir Asparouhov posted on Wednesday, October 21, 2009 - 1:59 pm
The current Mplus version provides log-likelihood testing with imputation only for the SEM model with continuous variables (that would be the test of fit). As far as I know there is no simple way to construct likelihood correction factors that can be used easily to do general LRT tests for arbitrary nested models, i.e., even for the simple SEM with continuous variables you can only get the test of fit at this time. I would say LRT with imputation is still a tricky topic.

On the other hand Wald test is not - use model test to conduct a test for multiple parameters. In addition the SE (and the T value/test which is the same as the univariate Wald test) that are already in the output can be used to see if fit improves, i.e., to see if the predictors are significant.
 Paola posted on Friday, February 12, 2010 - 6:34 am
I have 1000 replications, each replication contains 5 imputed datasets,
is it possible to do a random intercept model on all 1000 replic with both type= Montecarlo and type=Imputed?
If so, how?
 Linda K. Muthen posted on Friday, February 12, 2010 - 9:33 am
I think you want to combine TYPE=IMPUTATION and TYPE=MONTECARLO. This cannot be done with Mplus.
 Dylan K posted on Monday, February 22, 2010 - 7:32 am
Dear MPlus team,

I'm a complete novice with MPlus. I'm using it to hopefully produce a MIMIC LCA model. I performed Multiple Imputation using STATA as I had missing covariates. I'm okay with reading the imputed file into MPlus but where I'm getting stuck is in specifying an indicator for the imputed datasets within the single file. When I run the input file without this I get reams of output along the lines of:

*** ERROR in Data command
An error occurred while opening the file specified for the FILE option.
File: C:\Documents and Settings\user\Desktop\Dylan\SRA\OrigData\mi\***111415.

Hope you can help prevent me pulling my hair out any further!

Thanks and best wishes,

Dylan
 Linda K. Muthen posted on Monday, February 22, 2010 - 7:38 am
The data sets for multiple imputation must be in separate files.
 Dylan K posted on Monday, February 22, 2010 - 8:08 am
Hi,

Thanks for getting back to me so quickly. I'm still getting confused. I've split the imputed datasets into different files, each beginning with data. (e.g. data1.dat).

I've called the file that these are stored in data as well. My input file reads:
DATA:
File is
"C:\Documents and Settings\user\Desktop\Dylan\SRA\data.dat";
TYPE=IMPUTATION;

I'm getting the following messages
*** ERROR in Data command
The file specified for the FILE option cannot be found. Check that this
file exists: C:\Documents and Settings\user\Desktop\Dylan\SRA\data.dat

I get the same whether I mane the parent file just data or put the .dat extension on. I've also tried to put the data in csv but no luck.

Does it matter if my syntax file is stored in another file?
Can you see where I'm going wrong?

(Apologies if this is a v basic question!)

Thanks in advance,

Dylan
 Linda K. Muthen posted on Monday, February 22, 2010 - 8:38 am
See Example 12.13 in the user's guide. If this does not help, please send your output file and license number to support@statmodel.com.
 Kelly P posted on Tuesday, March 02, 2010 - 7:35 am
Hi,

I am currently considering using multiple imputation due to missing data problems I am encountering with my dataset. However, the data has a large number of sibling pairs, for which I use the cluster command, and the model I am running has indirect effects. Would these options be available if I was using type=imputation? From the above posts it looks like there would be ways to compute the indirect effects, but I am also concerned about the cluster option.

Thanks!
 Linda K. Muthen posted on Tuesday, March 02, 2010 - 2:25 pm
The CLUSTER option is available for TYPE=IMPUTATION;
 Kelly P posted on Wednesday, March 03, 2010 - 4:35 am
Thanks Linda! Are there examples available anywhere showing how to use the CONSTRAINT command to compute indirect effects?
 Linda K. Muthen posted on Wednesday, March 03, 2010 - 5:41 am
No. An indirect effect is the product of the regression coefficients.
 miriam gebauer posted on Tuesday, March 30, 2010 - 12:42 pm
hello,
reading the posts above lead me to the assumption that indirect effects and/or interaction modelling with latent variables shouldn't be done with multiple imputed data, because it violates the basic assumptions of multiple imputation (linear connection between all variables within the imputed datasets)? do I understand this right??
thanks for your help!
miriam
 Linda K. Muthen posted on Wednesday, March 31, 2010 - 10:44 am
Indirect effects are linear so your concern would not apply to them. For interactions, you may want to include the interactions in the set of variables used for imputation.
 Antti Kärnä posted on Monday, May 03, 2010 - 7:10 am
Hi,
is there a way to include a certain variable in the variable names list and still not to use it for imputation? Now in the User's guide it is stated: "Because the variable z is included in the NAMES list, it is also used to impute missing data for y1, y2, y3, y4, x1 and x2"(p.348). It is obvious that not all variables are useful for imputation, for example IDs.
 Bengt O. Muthen posted on Monday, May 03, 2010 - 7:38 am
In UG ex 11.5 you don't have a USEV statement, which means that the USEV variables are the same as the NAMES variables.

If you add a USEV statement - that excludes say ID of the NAMES list of variables - the variables on the USEV list are the ones that will be used for imputation.
 craig wiernik posted on Monday, June 21, 2010 - 10:10 am
Hi all,
I have a general modeling question. I've used Stata to prep my data for use in mplus. I've also needed to handle some missing data on my dependent variables. So, using ICE in stata, I created a bunch of datasets in which I've imputed values for a select number of variables, leaving everything else alone. What this means is that when Mplus sees the data, I have a "complete" set of data for my dependent variables, but may still have missing values on the independent, covariates, and controls.

I thought Mplus handled missing data that was not on the dependent variable, but I'm finding that analysis on my imputed datasets using the
"Type = IMPUTATION" command still loses many cases, largely due to the covariates and control variables.

Am I doing something wrong in Mplus? Or, do I want to create datasets in which I've imputed everything, and just send Mplus complete data?

thank you!
craig
 Linda K. Muthen posted on Monday, June 21, 2010 - 11:22 am
Missing data theory is for dependent variables only. If you don't want observations with missing on the covariates to be excluded, you need to impute for the full data set. You can do this in Version 6 of Mplus using the DATA IMPUTATION command. See Example 11.5 in the Version 6 user's guide.
 craig wiernik posted on Monday, June 21, 2010 - 11:57 am
Hi Prof. Muthen,
We only have v5, so I'll impute everything I want to in Stata, and then use the files in Mplus.
Thank you very much for the prompt reply! :-)
craig
 John Mallett posted on Thursday, July 29, 2010 - 4:56 am
I am looking to use multiply imputed data sets to run a multiple regression model with a continuous outcome variable.

I have missingness on both my predictors and outcome variable, so I am wondering if it is necessary to omit the outcome variable from the imputation model when creating the MI data sets ?

Is there a reference you can recommend that deals with this?

Thanks
 Jon Heron posted on Thursday, July 29, 2010 - 7:19 am
HI John,

I know a reference that says the opposite if that helps

Missing Data Analysis: Making It Work in the Real World
John W. Graham
Annual Review of Psychology, Vol. 60: 549-576

In the section on dispelling the myths

"The fear is that including the DV in the
imputation model might lead to bias in estimating
the important relationships (e.g., the regression
coefficient of a program variable predicting
the DV). However, the opposite actually happens.
When the DV is included in the model, all
relevant parameter estimates are unbiased, but
excluding the DV from the imputation model
for the IVs and covariates can be shown to produce
biased estimates. The problem with leaving
the DV out of the imputation model is this:
When any variable is omitted from the model,
imputation is carried out under the assumption
that the correlation is r = 0 between the omitted
variable and variables included in the imputation
model. Thus, when the DV is omitted,
the correlations between it and the IVs (and
covariates) included in the model are all suppressed
(i.e., biased) toward 0."
 John Mallett posted on Thursday, July 29, 2010 - 7:59 am
Thank you Jon for your suggestions and reference. Much appreciated.

I was thinking about this question in the context of a planned missingness design (3-form - Graham, Hofer, and MacKinnon (1996), where the DV construct and the predictors (if measured by multiple items) are systematically reduced in different versions of the form and the missingness subsequently imputed using all available information from all 3 forms.

I hope this makes sense?
John
 Jon Heron posted on Thursday, July 29, 2010 - 8:21 am
Hmm, never come across that before. Is the missing data by design treated as MCAR then? depending on how they divide their sample I suppose.
 John Mallett posted on Thursday, July 29, 2010 - 10:06 am
"Hmm, never come across that before. Is the missing data by design treated as MCAR then? depending on how they divide their sample I suppose?"

Yes Jon
 David Bard posted on Monday, August 09, 2010 - 10:57 pm
Can you clarify the variable output from a twolevel MI procedure in version 6.0? It looks like variables with the same original names represent the observed and imputed values, variables appended by asterisks are thresholds or within-level latent response values, and variables prefaced by 'B_' represent random posterior draws from the between level (for variables modeled at both levels), but I couldn't find this documented (if it is documented, could you direct me to that segment of the manual in case other questions arise).

Is it possible to output latent response variable values for between-level-only variables? I'm not seeing a B_ variable for any of my between-only variables.

Also, I tried to save my subject ID variable as an auxiliary variable. A column for it appears in each imputation file, but each value is stored as 10 asterisks. Is there a limit on the size of these auxiliary variables? My Ids are 7 digits.

Thanks.
 Linda K. Muthen posted on Tuesday, August 10, 2010 - 12:18 pm
The IDVARIABLE option of the VARIABLE command should be used to identify the id variable not the AUXILIARY option. Please send the full output as an attachment and your license number to support@statmodel.com so I can see what is being saved.
 Sharon Ghazarian posted on Monday, August 23, 2010 - 11:03 am
Is there a limit to the number of variables that can be imputed at one time?
 Linda K. Muthen posted on Tuesday, August 24, 2010 - 8:17 am
There is no limit, but with a large number of variables the number of parameters in the imputation model may be large. Use only the analysis variables and missing data correlates to impute data. Don't use all variables in a data set for example.
 Sharon Ghazarian posted on Tuesday, August 31, 2010 - 9:56 am
Thanks Linda.

Another question - is there a way to put in variable specific minimum and maximum constraints for multiple imputation? For example, often times multiple imputation results in extreme values on some variables and so constraints are necessary to tell the program that imputed values should only fall between 1 and 4 (as an example). Is there any place to do this in MPlus right now?
 Bengt O. Muthen posted on Tuesday, August 31, 2010 - 4:32 pm
There is currently not a way to do this for continuous outcomes.
 Bengt O. Muthen posted on Tuesday, August 31, 2010 - 5:00 pm
I was just reminded that you do have the option

VALUES =

If the number of values that are present in the data is a relatively small number (1, 2, 3, 4) you just list those.

Otherwise you can use

1.0 1.1 1.2 .... 3.9 4.0

to get a rounding of the imputed value to the first decimal.
 Sharon Ghazarian posted on Wednesday, September 01, 2010 - 8:43 am
Great - thank you!
 Maria Clara Barata posted on Tuesday, September 07, 2010 - 8:33 am
Hi!
I am trying to use the new multiple imputation software in MPLUS but all I am getting are fatal error messages.

It seems to be reading the data in correctly, so I was wondering if there is a way to get a more detailed error message so that I can troubleshoot. I am using output: TECH8 as per example 11.5.

*** FATAL ERROR
PROBLEMS OCCURRED DURING THE DATA IMPUTATION.

Thanks
 Bengt O. Muthen posted on Tuesday, September 07, 2010 - 11:59 am
Please send your input, output, data and license number to support@statmodel.com.
 Maria Clara Barata posted on Tuesday, September 07, 2010 - 1:57 pm
Thanks. Just did.
Clara
 Tom Booth posted on Monday, October 11, 2010 - 5:16 am
Hello,

I am trying to run multiple imputations on a set of mixed categorical and continuous variables (n=972). I am using the default H1 imputation (sequential regression). From reading Asparouhov & Muthen (2010, 15th July) this seemed most appropriate.

I am getting a warning that reads;

*** FATAL ERROR
THERE IS NOT ENOUGH SPACE TO RUN MPLUS ON THE CURRENT INPUT FILE....

I have no other programs running, and have installed the 32-bit version for which the machine I am running it on has plenty of capacity.

I am unsure what about the analysis I am running is causing this issue.

Regards,

Tom
 Linda K. Muthen posted on Monday, October 11, 2010 - 6:54 am
Please send your output file and license number to support@statmodel.com.
 Nicholas Bishop posted on Monday, October 11, 2010 - 12:38 pm
Hello,
I am receiving the same error message as Clara described above: *** FATAL ERROR
PROBLEMS OCCURRED DURING THE DATA IMPUTATION. I am receiving the imputed data sets correctly but the imputed list file contains zero data. What is needed to correctly produce the list for the imputed data sets? Thanks for your help.

Nicholas
 Linda K. Muthen posted on Monday, October 11, 2010 - 2:13 pm
The list file should be generated automatically. Please send your output file and license number to support@statmodel.com.
 Dana Wood posted on Thursday, October 21, 2010 - 12:51 pm
Hello,

I am trying to use the MODEL TEST command with multiply imputed datasets. The model runs fine with the multiply imputed datasets, but when I add in the request for MODEL TEST, I don't get any output. The black MS-DOS screen appears for a brief flash and then nothing happens.
 Linda K. Muthen posted on Thursday, October 21, 2010 - 2:27 pm
If you are not using Version 6.1, try that. If you are, please send the input, data, and your license number to support@statmodel.com.
 Alexandre Morin posted on Friday, October 22, 2010 - 1:29 am
Hi,
I have a couple of questions related to examples 11.5 and 11.6 of the manual.
(1) Example 11.5:
You mention on page 348 that all variables part of the NAMES list are used to impute data (on the variables listed under IMPUTED). Is there any way NOT to use some variables that are part of the list in the imputation ?
(2) Example 11.6: Will this be the same in Plausible values imputations ? Will all the variables listed in the NAMES list be used in generating the plausible values or only those included in the MODEL section ? If yes, is there a way again to not use some variables ?
(3) It it possible to generate multiple imputation data sets (5-10-20-etc.) including imputed values for missing data on observed variables and plausible values in the same data sets.
(4) How can we include additional variables in the saved multiple imputation and/or plausible values data set (lets say the z variables from example 11.5)? I do not necessarily want to impute these data or to use them in the imputation algorythm, just to have them saved in the created data sets so as to be able to use them in subsequent analysis. Will the simple AUXILLIARY function (without e-m-r) work ?

Thank you !
 Bengt O. Muthen posted on Friday, October 22, 2010 - 10:14 am
I am glad you asked so that this can be clarified.

(1) and (4):

UG ex 11.5 is not as clear as it could be on this point. A user would typically work with not only the NAMES and IMPUTE lists of variables, but also a USEVARIABLES list and an AUXILIARY list. The NAMES list simply reads the variables in the original data set. The USEVARIABLE list is a smaller subset of variables from the NAMES list, just as in an ordinary analysis. The NAMES list variables are the variables used to create the imputations. In UG ex11.5, the USEVARIABLE list is absent and therefore defaults to the NAMES list. Typically you also want to save into the imputed data set other variables that are not to be used in the imputation and to do that you put those variables on the AUXILIARY list.

(2) Same thing.

(3) The SAVE = data set contains what you are asking for. The PLAUSIBLE = data set gives summary statistics for plausible values.
 Bengt O. Muthen posted on Friday, October 22, 2010 - 10:32 am
Correction - I should have said:

UG ex 11.5 is not as clear as it could be on this point. A user would
typically work with not only the NAMES and IMPUTE lists of variables,
but also a USEVARIABLES list and an AUXILIARY list. The NAMES list
simply reads the variables in the original data set. The USEVARIABLES
list is a smaller subset of variables from the NAMES list, just as in
an ordinary analysis. The USEVARIABLE list variables are the variables used
to create the imputations. In UG ex11.5, the USEVARIABLES list is
absent and therefore defaults to the NAMES list. Typically you also
want to save into the imputed data set other variables that are not to
be used in the imputation and to do that you put those variables on
the AUXILIARY list.
 Alexandre Morin posted on Friday, October 22, 2010 - 11:11 am
Thank you very much!
It is indeed clearer.
 Alexandre Morin posted on Saturday, October 23, 2010 - 10:43 am
Hi again,
Does the AUXILIARY (m) function works in the generation of plausible values (i.e. plausible values are generated from a model, but can we let variables NOT in the model influence the generation of plausible values?).
The question is based on the result you report in the "plausible value" paper that, when plausible values are to be used in a secondary analyses, all of the variables to be used in this secondary analysis need to be part of the PVs generation...
I am generating PVs from a complex ESEM-Within-CFa model. I will use them in a secondary analysis with an additional variable. Yet, when I add this variable to the ESEM-within-CFa model and allow it to correlate with the factors, the model crashes. Unless I just need to geet the variable in the Model by estimating its variance without allowing it to correlate with the factors?
 Bengt O. Muthen posted on Saturday, October 23, 2010 - 5:42 pm
Aux(m) is only intended for ML estimation, not the Bayesian estimation used with plausible values.
 Nicholas Bishop posted on Monday, November 15, 2010 - 11:55 am
Hello,
I am currently using Mplus version 6.1 to perform multiple imputation. I am receiving the following warning message when I run my imputation model:

*** FATAL ERROR
PROBLEMS OCCURRED DURING THE DATA IMPUTATION.

THE PSI MATRIX IS NOT POSITIVE DEFINITE.

THE PROBLEM OCCURRED IN CHAIN 2.

All variables included in the impute list contain missing data, and I am using the PROCESSORS = 2 command to reduce computing time. I have also specified categorical and continuous variables. Can you suggest changes I can make to get the imputation working? Thanks for your help.

Nick
 Bengt O. Muthen posted on Monday, November 15, 2010 - 3:16 pm
There are two things you want to consider. One is the clarification of the UG imputation ex 11.5:

UG ex 11.5 is not as clear as it could be on this point. A user would
typically work with not only the NAMES and IMPUTE lists of variables,
but also a USEVARIABLES list and an AUXILIARY list. The NAMES list
simply reads the variables in the original data set. The USEVARIABLES
list is a smaller subset of variables from the NAMES list, just as in
an ordinary analysis. The USEVARIABLES list variables are the variables used
to create the imputations. In UG ex11.5, the USEVARIABLES list is
absent and therefore defaults to the NAMES list. Typically you also
want to save into the imputed data set other variables that are not to
be used in the imputation and to do that you put those variables on
the AUXILIARY list.

The other is the list of 14 suggestions in Section 4 of the Asparouhov-Muthen (2010) imputation paper on our website.
 Sofie Henschel posted on Wednesday, February 02, 2011 - 3:41 am
Dear Prof. Muthén,
I`m trying to run a SEM with an imputed data set. Mplus reads in all 5 data files, but is using none, hence the number of free parameters is zero and the test of model fit is not executed. TECH9 indicates that there is no convergence for each replication. However, if I run the 5 data sets separately, imputation 1,4, and 5 show a good fit while imputation 2 and 3 do not converge.
I am wondering why imputation 1,4, and 5 are not used with type=imputation although these models show good fit. Furthermore, is there anything I might investigate to see why imputation 2 and 3 do not converge? I am especially surprised by that since I have 11 variables in my data sets but only 3 were imputed, so the other 8 variables are the same for each of the imputation data sets.

Thank you very much for your support!
 Linda K. Muthen posted on Wednesday, February 02, 2011 - 6:46 am
If you are not using Version 6.1, you should do so. If you have this problem with Version 6.1, please send your input, data sets, output, and your license number to support@statmodel.com.
 Alain Girard posted on Wednesday, March 02, 2011 - 7:06 am
Hi
I have multiple imputed data sets and i want to perform a likelihood ratio test to compared 2 nested models. I read the technical appendix : "Chi-Square with Multiple Imputation".

Is exist a way in Mplus to compute this test ?

Thanks
Alain Girard
University of Montreal
 Linda K. Muthen posted on Wednesday, March 02, 2011 - 9:14 am
This can be done with the ML estimator.
 Kätlin Peets posted on Monday, March 14, 2011 - 3:08 pm
Hi,

Is it possible that the order of the variables that are used to impute missing data has an effect on imputation?
 Tihomir Asparouhov posted on Monday, March 14, 2011 - 3:58 pm
Yes. Multiple imputation uses random number generation as a part of the MCMC estimation of the imputation model. When the variables are reordered different random bits will be used for different variables. This however should have minimal impact on any proper use of the imputed data sets.
 Kätlin Peets posted on Wednesday, March 16, 2011 - 7:22 am
I have another problem. When I try to impute data, I get the following error message:

FATAL ERROR
THE NUMBER OF CLUSTERS PLUS THE PRIOR DEGREES OF FREEDOM OF PSI
MUST BE GREATER THAN THE NUMBER OF LATENT VARIABLES.
USING MORE INFORMATIVE PRIOR FOR PSI CAN RESOLVE THIS PROBLEM

Usually, I have solved it by decreasing the number of variables used for imputation. What should I do?
 Tihomir Asparouhov posted on Wednesday, March 16, 2011 - 9:49 am
This happens because the number of variables in the imputation is more than the number of clusters in data. You can either remove some of the variables in the imputation model or you can perform an H0 imputation. With an H0 imputation you can use a factor analysis model on the second level for imputation purposes or you can use an unrestricted model with a different prior for the variance covariance matrix.

Take a look at section 3.3 in http://statmodel.com/download/Imputations7.pdf
 Tihomir Asparouhov posted on Wednesday, March 16, 2011 - 9:52 am
Also I think you are using Mplus version 6. If you use version 6.1 you will not have that problem.
 Kätlin Peets posted on Wednesday, March 16, 2011 - 5:54 pm
Thank you. I now downloaded Mplus 6.1, and the problem was solved.

I still have another question. I center my data before imputation (as I also form interaction terms before imputation). However, I am not sure which mean value to use when I center my variables that have missing values. Should I subtract the mean values that are computed on the basis of the cases/clusters that have no missing values?

My second question concerns variances of the parameter estimates. I understand that these are squared standard errors. But sometimes variance estimates (in the Tech 3 output) are not always equal to the squared standard errors in my model output (e.g., SE = .104, variance estimate = .005). Is that due to rounding error or smth else? What should I do in this case? I need these variance estimates for calculating simple slopes.
 Linda K. Muthen posted on Thursday, March 17, 2011 - 8:49 am
You should center after imputation.

Those values sound quite different even for rounding. Please send the output and your license number to support@statmodel.com so we can take a look at it.
 Kätlin Peets posted on Thursday, March 17, 2011 - 9:58 am
I have understood that it is advised to include all the necessary interaction terms in the imputation phase.

If I were to center after imputation, how can I create interaction terms between observed variables? It seems that I cannot use "define" command.
 Linda K. Muthen posted on Thursday, March 17, 2011 - 10:59 am
You can saved the imputed files. DEFINE works with TYPE=IMPUTATION.
 David Bard posted on Monday, March 21, 2011 - 11:41 pm
I'm having a hard time grasping how the default H1 sequential model is parameterized and estimated when there is a mixture of categorical and continuous imputation variables. The output seems to suggest that both a WLSMV and a Bayes estimator are being used at various points in time. Is the model first estimated with WLSMV and then somehow transitioned to a Bayesian analysis? When I try to create an H0 model that mimics sequential regression with a WLSMV estimator, I'm asked to use the Theta parameterization, but the default H1 model output claims to use Delta. When I try to use a Bayesian estimator for this H0 model, I am unable to reach convergence. Is it even possible to write the default H1 seq reg model as an M+ H0 model?

Thanks!
 Tihomir Asparouhov posted on Tuesday, March 22, 2011 - 10:51 am
First let me say that unless you are using the older version Mplus 6, the default imputation model is not sequential. Starting with version 6.1 the default imputation model is COVARIANCE.

I think your confusion about what is happening stems from the fact that you have a model (and if you don't specify an estimator you are essentially using the default WLSMV estimator) and a data imputation statement. In this case, Mplus assumes that you want the WLSMV estimator for your estimation, but you want to deal with the missing data via multiple imputations. Therefore Mplus will perform Bayes estimation first to impute the missing data, then analyze the imputed data using the WLSMV estimator.

To simplify the methodology I would suggest that as a first step you perform the imputation and estimation separately. To get only imputation specify type=basic in the analysis command and remove the model statement. This will just generate imputed data, which you can later analyze as in the example on page 348 in the User's Guide.

Now if you are interested in H0 imputations, follow example 11.7, i.e., you have to specify estimator=Bayes, an imputation model, and the data imputation statement. To mimic the sequential regression imputation as an H0 model imputation the first thing to do is to specify the command
MEDIATOR = OBSERVED;
in the analysis command.
 Tihomir Asparouhov posted on Tuesday, March 22, 2011 - 10:55 am
Regarding parametrizations, any Bayes or Imputation estimation is based on the Theta parameterization. On the other hand, with the WLSMV estimator generally both the Theta and the Delta parameterizations are available and can be used, however, for some models only the Theta parameterization is available and the sequential model sometimes is such a model.
 David Bard posted on Tuesday, March 22, 2011 - 4:28 pm
You are right, I have not yet upgraded to 6.1, but will do so shortly. The WLSMV is listed as an Estimator in my file with or without a model statement (under 'Summary of Analysis' section of the output), but sounds like this simply reflects the default estimator were I to have included a model. Thanks for clarifying.

I do want the seq reg imputation in this instance. Any advice on getting my H0 version of this off the ground. Do I need to include fairly accurate starting values? I think van Buuren and Ibrahim have commented that the sequence of variable regressions can matter. Can you share the M+ default for setting up these seq regression equations when type=basic? Are the data restructured first to appear roughly like Monotone missingness?
 Tihomir Asparouhov posted on Tuesday, March 22, 2011 - 4:56 pm
David

I am not very clear why you want the imputation with the sequential method. Version 6.1 has a better method already - the covariance model. Second I am not sure why you are not using the H1 imputation which is already preset for you in terms of optimal performance, just use type=basic, estimator=Bayes and add the data imputation statement, add model=sequential. Mplus does not reorder the variables, we use the order specified in the usevar command. We have not seen examples where the order of the variables is important.

Finally if you want to do an H0 imputation and you specified the MEDIATOR = OBSERVED; as well as the model and you are experiencing convergence problems I would suggest that you send it to support@statmodel.com

Tihomir
 Kätlin Peets posted on Monday, March 28, 2011 - 5:57 am
I would like to use observed classroom-level means in my analyses. However, some individuals (in some classrooms) have missing values, and thus classroom means would be calculated on the basis of those individuals who don't have missing values (these individual scores are imputed later on though).
Is that problematic? The problem is that I would like to create interaction terms between classroom-level means and include them in the model when imputing the rest of the data.
 Linda K. Muthen posted on Monday, March 28, 2011 - 10:20 am
This should not be a problem.
 Michael Green posted on Wednesday, April 06, 2011 - 1:56 am
Can the SAVEDATA command be used to control the output file format for the imputed files from DATA IMPUTATION?

Thanks, MG
 Kätlin Peets posted on Wednesday, April 06, 2011 - 7:54 am
I am conducting simple slope analyses (to follow up my interactions). I have imputed my data (20 data sets). I need to use covariances and variances of my parameter estimates. Do I need to hand calculate the averages of variances-covariances across 20 data sets (from output 3), or is there an easier solution (I guess I could use squared standard errors from the model output to get the variances of the parameter estimates)?
 Linda K. Muthen posted on Wednesday, April 06, 2011 - 12:10 pm
Michael:

The format of imputed data sets cannot be changed. If the original data is in fixed format, the data saving for imputation will use the format of the original data. But if it is free format, then it uses the default of F10.3.
 Linda K. Muthen posted on Wednesday, April 06, 2011 - 12:12 pm
Katlin:

TECH3 is available with TYPE=IMPUTATION.
 Kätlin Peets posted on Wednesday, April 06, 2011 - 12:25 pm
Yes, I do use Tech3, but I get 20 variance-covariance matrices (as I have 20 imputed data sets). So, for instance, when I need a covariance value between the intercept and moderator, do I need to calculate the average covariance across the data sets? I assume this is what I need to do as there isn't such a "summary" matrix across the data sets. Or am I wrong?
 Linda K. Muthen posted on Wednesday, April 06, 2011 - 1:49 pm
Averaging over TECH3 is not correct. You can square the standard error of the variance parameter but that does not get you everything you need. Perhaps you can use MODEL CONSTRAINT to do what you want.
 Kätlin Peets posted on Wednesday, April 06, 2011 - 2:09 pm
Could you be more specific (I would really appreciate your help)? How would I get covariance estimates when using MODEL CONSTRAINT?
 Linda K. Muthen posted on Wednesday, April 06, 2011 - 2:12 pm
I'm not saying you would get the covariance estimates from MODEL CONSTRAINT. Perhaps you can define whatever it is you want a standard error for in MODEL CONSTRAINT. You would then obtain a standard error. Other than that, I have no suggestions. See MODEL CONSTRAINT in the user's guide for further information.
 Yijie Wang posted on Friday, April 22, 2011 - 9:18 am
Hello,

I'm doing a multiple imputation and want to use the generated data for further analyses. Is there a way for mplus to combine all the imputed datasets and yield an averaged dataset? Thank you!
 Linda K. Muthen posted on Friday, April 22, 2011 - 10:55 am
No. We produce the individual data sets only.
 Michael Green posted on Wednesday, May 18, 2011 - 2:44 am
Hi,

I understand that interaction terms should be included in an imputation model.

When using the unrestricted H1 imputation option, should the interaction terms themselves be imputed along with the variables from which they are derived (which would lead to interaction terms which are not the exact product their source variables), or should the interaction terms be used only as predictors in the imputation, and then re-calculated from the imputed data during analysis?

Best, MG
 Linda K. Muthen posted on Wednesday, May 18, 2011 - 9:48 am
We would not include the interaction term in the imputation of the data. We would use it only in the subsequent analysis.
 Peren Ozturan posted on Friday, May 27, 2011 - 5:48 pm
Hi,
Do you recommend running latent factor interaction models on multiply imputed data?
Thanks.
 Bengt O. Muthen posted on Saturday, May 28, 2011 - 5:45 am
Do you mean creating factor scores via plausible values and then creating interactions? Or do you mean imputing missing values on observed variables and then doing XWITH? The former is an interesting idea that should be explored. The latter is straightforward.
 Peren Ozturan posted on Saturday, May 28, 2011 - 9:07 am
I was actually asking about the latter but was confused about the post dated March 30, 2010 - 12:42 pm and Linda's related answer.
1. My structural model is composed of latent factor interactions. Should I have included XWITH while running my imputation model? Or it is fine to run imputation by modelling main effects and then specify XWITH while runnning the structural model on imputed data?
2. Is there a way to go around the two-step approach regarding the use of multiple imputation (i.e. first impute data, then estimate structural model)? Can't we do them simultaneously?
3. Is multiple group analysis on imputed data straightforward, as well? If so, then I am able to test whether grouping improves model fit by comparing models' loglikelihoods, right? I am asking this because in output, we get message "the loglikelihood cannot be used directly for chi-square testing with imputed data" but post dated March 19, 2007 - 5:59 pm states they could be used.
 Bengt O. Muthen posted on Saturday, May 28, 2011 - 10:06 am
1. The imputation model does not have to be correct relative to the analysis model, but how large the deviation can be depends on the situation. So with l.v. interactions, your analysis model contains them, and the imputation may or may not use them. My current thinking is that the imputation is probably good enough without using them. I am not sure if our H1 (unrestricted) imputation has difficulty converging when having both the l.v.'s and their interactions.

2. Yes, you can do it in one run by specifying estimator = ML/WLSMV (but not Bayes).

3. Yes, multiple-group imputation can be done. We provide a chi-square test suitable for multiple imputed data - see the technical appendix Chi-Square Statistics with Multiple Imputation. I am not sure this can handle chi-square difference testing, however.

A new version of the Topic 9 handout including an expanded discussion of multiple imputation taught at UConn last week will be posted next week.
 Sofie Henschel posted on Monday, June 06, 2011 - 4:06 am
Dear Linda and Bengt,
I`m trying to run a multigroup sem with imputed data. However, in order to get the model running in all imputed data sets, I need to restrict the residual variances of one my variables to be equal. Running the model separately in each imputation data set does not require this restriction. I am now wondering why I need restrictions in the overall model when I don`t need to restrict the residual variances in each of the single data sets. Is it right that, when using imputed data, Mplus runs each data set separately and then combines the results using Rubins’ formula? And if so, why do I need special restrictions with the imputed data set?
Thanks in advance!
Sofie
 Linda K. Muthen posted on Monday, June 06, 2011 - 7:09 am
This does not make sense. If you are not using Version 6.11, try that. If you are and still have the problem, please send the files and your license number to support@statmodel.com.
 Eric Teman posted on Sunday, June 26, 2011 - 6:52 pm
I have read in several places that multiple imputation has no set rules in regard to pooling likelihood ratio chi-square values or adjunct fit indexes. Does Mplus have a special way of handling this?
 Linda K. Muthen posted on Sunday, June 26, 2011 - 7:10 pm
The following technical appendix on the website describes our ML chi-square for multiple imputation:

Chi-Square Statistics with Multiple Imputation

For all other fit statistics we give the average over imputations.
 Eric Teman posted on Monday, June 27, 2011 - 6:42 pm
Are you aware of any issues when taking the average over imputations for fit statistics? I'm just wondering whether Enders' concern is warranted about "no rules exist for combining fit indices from multiply imputed datasets"
 Linda K. Muthen posted on Monday, June 27, 2011 - 8:08 pm
The averages are not correct. See the Technical Appendix to see the difference between the average and the correct chi-square.
 Eric Teman posted on Monday, June 27, 2011 - 8:12 pm
Sorry, I was referring to the adjunct fit indexes. Are there any known problems/issues with those averages?
 Linda K. Muthen posted on Tuesday, June 28, 2011 - 5:40 am
All of the averages for RMSEA etc. are simply averages. Only the ML chi-square is correct for imputation. Chi-square for weighted least squares is also given as an average and is not to be interpreted for model fit.
 Juanjo Medina posted on Wednesday, June 29, 2011 - 12:07 pm
Hiya,

I'm trying to run a multiple imputation model but experiencing some problems. Mplus stops at the 12500 iteration because of lack of memory. I'm running the program in Windows32 bits, with a dual processor 2.3g (using processor=2), 2.2g of ram and all non essential processes (even antivirus) terminated. I found the 2010, version 2 Aparouhov & Muthen paper, where they recommend the use of the FBITER and THIN option (the latter is also suggested by my output). Yet, when I try to use the FBITER option in my 6.11 version of Mplus I'm told this function is unrecognised. Any help with this is welcome.
 Linda K. Muthen posted on Wednesday, June 29, 2011 - 2:44 pm
Please send your files and license number to support@statmodel.com.
 Eric Teman posted on Wednesday, June 29, 2011 - 4:16 pm
Are you aware of any published research indicating taking the averages of adjunct fit statistics across imputations is not correct?
 Linda K. Muthen posted on Wednesday, June 29, 2011 - 4:49 pm
I don't know of anything specifically. You might look at Craig Ender's book and Joe Schafer's book. Both references should be in the Topic 9 course handout.
 Tihomir Asparouhov posted on Thursday, June 30, 2011 - 8:42 am
I am not sure what adjunct fit statistics is but in general the chi-square statistics should not be added directly. See

http://statmodel.com/download/MI7.pdf

for simulations and description of how Mplus does this. Also any approximate fit indices based on the correct chi-square statistic should be valid.
 Brondeel Ruben posted on Thursday, September 15, 2011 - 3:14 am
Hi,

I would like to use the variance-covariance matrix of the coefficients to make some plots. I can export the matrices for each imputation (tech3). How can I summarize the 10 matrices into 1?
I understand that I could use the constrain command, but for all covariances, this means a whole lot of input.
Are the means over the 10 matrices a good approximation? or the median? It doesn't have to be 100% correct, since it's only for plots. It just can't be too variable to the used dataset.

Greetings,
Ruben.
 Linda K. Muthen posted on Thursday, September 15, 2011 - 11:37 am
With TYPE=IMPUTATION, you will get a correct TECH3. This is what you should use.
 Sierra Bainter posted on Thursday, September 22, 2011 - 10:52 am
Hi,

I am imputing a single categorical variable using a number of completely observed variables in my data set. Do I have to include completely observed categorical variables on the Categorical statement?
 Linda K. Muthen posted on Thursday, September 22, 2011 - 2:08 pm
Any categorical variable on the IMPUTE list should have (c) after it. The CATEGORICAL option is for the analysis not the imputation of the data. All dependent variables in the analysis should be on the CATEGORICAL list.
 Mauricio Garnier-Villarreal posted on Tuesday, November 22, 2011 - 8:56 pm
Hi,

I execute and save multiple imputations with Mplus, but when I analyzed the list of the data sets Mplus doesn't estimate the fit indices and report a negative variance in every data set (FILE IS TESIMPlist.dat;
TYPE = IMPUTATION;). When I analyzed each data set by itself works fine and doesn't report the negative variance.

Why is that Mplus is not working properly with list?

thanks

Mauricio
 Linda K. Muthen posted on Wednesday, November 23, 2011 - 1:39 pm
If you are not using Version 6.12, do. If you are, please send the relevant files and your license number to support@statmodel.com.
 Eric Teman posted on Tuesday, January 17, 2012 - 1:36 pm
During the analysis phase of multiple imputation, is it possible for Mplus to save the averaged parameter estimates (and the corrected chi-square) as a data file? When I use SAVEDATA: RESULTS ARE results.dat, I get the un-averaged parameter estimates for the NDATASETS, which means no corrected chi-square is being saved.
 Linda K. Muthen posted on Wednesday, January 18, 2012 - 3:52 pm
We do not save the averaged results. We save the results from each imputation. The average results are given in the results section.
 Eric Teman posted on Friday, January 27, 2012 - 7:20 pm
When fixing the latent variances to one so that all factor loading can be estimated, is it normal for WLSMV used with multiple imputation to produce negative factor loadings? Is this OK?
 Linda K. Muthen posted on Friday, January 27, 2012 - 7:50 pm
Factor loadings can be positive or negative. They are regression coefficients.
 Eric Teman posted on Friday, January 27, 2012 - 8:20 pm
Sorry, I should have been more clear. It is a simulation study where I have set the population values to be positive. But when I employ multiple imputation, the factor loadings are often negative when the latent variances are fixed to 1, but never negative when the latent variances are free. It seems a bit odd.
 Bengt O. Muthen posted on Saturday, January 28, 2012 - 11:20 am
Perhaps what you see is that all the factor loadings for a certain factor change sign to negative. That is ok and simply means that your factor is reversed (say from knowledge to ignorance). That gives the same fit. You often see this sign reversal in EFA. It is harmless.

When you set the metric by fixing a loading to 1 you effectively decide on the sign.
 Isabella Lanza posted on Thursday, February 02, 2012 - 1:55 pm
I have a question about imputing interactions. Initially I thought I should just impute my main variable, and then aggregate my imputed datasets to calculate interactions based off the main variable of interest. However, reading over the literature (Von Hippel 2009 - transform then impute) and how Mplus derives results from multiple imputed datasets, I realized that I should include interactions in the imputation procedure. Ok this all makes sense, but I am having trouble with model convergence. I've increased the iterations and deleted variables from the USE VARIABLE command and it still hasn't solved the problem. Is there any way this problem could be related to the fact that I am asking for standardized interactions? Thanks for any input.
 Linda K. Muthen posted on Friday, February 03, 2012 - 8:59 am
Try imputing without the interactions and see if that works.
 Maarten Pinxten posted on Monday, February 06, 2012 - 4:42 am
Are variables included in the NAMES list automatically used to impute missing data or do I have to define them explicitly as auxiliary variables? Thank you very much!
 Linda K. Muthen posted on Monday, February 06, 2012 - 1:53 pm
If you do not have a USEVARIABLES list, all variables on the NAMES list are used to impute data for the variables on the IMPUTE list.
 Maarten Pinxten posted on Tuesday, February 07, 2012 - 1:14 am
Thanks for the reply! Using the VALUES command I specified the range of the imputed values for each variable (minimum and maximum) but an inspection of the imputed data sets shows that for some imputations the values exceed those restrictions. Any idea how this is possible? I would expect non-convergence if the mcmc algorithm can't find a value between the specified range after x number of iterations... Do I need to worry about this (percentage of missingness 17%)?
 Linda K. Muthen posted on Tuesday, February 07, 2012 - 7:17 am
Please send the output, a data set that shows this, and your license number to support@statmodel.com.
 Maarten Pinxten posted on Monday, February 20, 2012 - 1:35 am
An additional question: How can I let Mplus know that it should not use SCHOOLID as an covariate to impute the requested variables? How can I include my schoolID in the imputed datasets?


When I use the 'Cluster' option in the NAMES command (with TYPE=COMPLEX), Mplus computes all the requested datasets but in the output it shows the error message 'all variables are uncorrelated with all other variables'.

When I just run the same MI-model without SCHOOLID included in the input file, my model just runs perfectly.

When I ran the same input datafile but than with the SCHOOLID included (in combination with the USEVARIABLE command in the input syntax) my SCHOOLID is not shown in the imputed datasets.

I guess there's a simple solution but I can't figure it out. Thank you very much!
 Linda K. Muthen posted on Monday, February 20, 2012 - 8:26 am
Use the IDVARIABLE option of the VARIABLE command.
 Andre Plamondon posted on Wednesday, February 22, 2012 - 10:43 am
When using the "values=" option with multiple imputation, is it possible to specify a range of values in which negative values are possible?
 Linda K. Muthen posted on Wednesday, February 22, 2012 - 3:49 pm
We do not currently allow negative values but will do so in the next version. The workaround for this is to add a constant to your variable that makes all numbers positive, impute, and then subtract the constant.
 Aurora Zhao posted on Thursday, March 08, 2012 - 5:44 pm
Hi Dr. Muthen,

I am a beginner of handling missing data with multiple imputation. I am looking at the example 11.5. I am wondering how to calculate this missing data correlate "z" from the original data and save it into the data set to do M.I. Thank you very much!
 Linda K. Muthen posted on Friday, March 09, 2012 - 3:54 pm
Z is not a variable that you create. It is part of the dataset that is used to impute the variables on the IMPUTE list.
 Owis Eilayyan posted on Wednesday, April 04, 2012 - 6:13 am
Hello,
I tried to do multiple imputation using the following command, but i couldnt find the saved output file. could you please tell me where is the output file saved?

TITLE: this is an example of multiple imputation
for a set of variables with missing values
DATA: FILE IS C:\Users\Admin\Desktop\MGH\Owis\path3.dat;
VARIABLE: NAMES ARE smo em sym act emo env pf ef sf re bmi age fev gender act5 actc;
missing = .;
DATA IMPUTATION:
IMPUTE = smo (c) em (c) sym pf ef sf re bmi age fev gender (c) act5 (c) actc (c);
NDATASETS = 10;
SAVE = C:\Users\Admin\Desktop\MGH\Owis\essra.dat;
ANALYSIS: TYPE = BASIC;
OUTPUT: TECH8
 Linda K. Muthen posted on Wednesday, April 04, 2012 - 6:29 am
The saved data set name should be essra*.dat. The asterisk is replaced by the numbers of the datasets, for example, essra1.dat, essra2,dat etc.
 Owis Eilayyan posted on Wednesday, April 04, 2012 - 6:34 am
I did that but i got this error message:

*** ERROR in Data command
The file specified for the FILE option cannot be found. Check that this
file exists: C:\Users\Admin\Desktop\MGH\Owis\essra1.dat
 Linda K. Muthen posted on Wednesday, April 04, 2012 - 8:14 am
Please send the relevant files and your license number to support@statmodel.com.
 finnigan posted on Friday, April 13, 2012 - 10:37 am
Linda

I have a longitudinal data set where indicators have 20-30% missing data across three waves
Covariates have 1-5% missing data across three waves.

I will be conducting measurement invariance using CFA and then estimating a multiple indicator growth model.
I am following two solutions an FIML and a multipe imputation to handle the 20-30%
To follow Multiple imputation should the model used to generate the data sets be the CFA or the growth model?



Thanks
 Linda K. Muthen posted on Saturday, April 14, 2012 - 9:15 am
I would impute according to the H1 model.
 Lindsay Bell posted on Tuesday, May 08, 2012 - 8:25 pm
Hi -

I have a few questions about multiple imputation with a multilevel model. First, I am finding high autocorrelations lasting for many iterations for the between-level parameters. Do you know if this is normal?

Second, is there a way to get Mplus to show me the autocorrelations for more than 30 lags?

Third, I don't quite understand where Mplus draws the imputed data sets. If I specify THIN=500 in the data imputation command, then is Mplus drawing the imputed values from every 500th iteration, beginning with the first iteration after burn-in (i.e., 1st data set has values from the 10,000th, 2nd from the 10,500th, and so on)?

Finally, in the imputation, I want Mplus to take into account the fact that certain values (i.e., family socioeconomic status) tend to be similar within schools. I have specified it like this:

CLUSTER=schoolid;

ANALYSIS:
TYPE = TWOLEVEL;

MODEL:
%WITHIN%
ses ON par_ed h_income;

%BETWEEN%
ses;

Does this accomplish my goal of accounting for the school-level clustering of values on that variable? If not, can you tell me how I can?

Thank you,
Lindsay
 Tihomir Asparouhov posted on Wednesday, May 09, 2012 - 10:56 am
1. It is normal to have high autocorrelations in two-level imputation because especially when the number of clusters is not far from the number of variables (this leads tyo nearly singular variance covariance matrix on the between level). We basically would recommend and H0 model imputation with a 1 factor analysis model on the between level. Take a look at sections 3.3 and 3.4 in

http://statmodel.com/download/Imputations7.pdf

2. You cannot get more than 30 autocorrelations. You have to use the thin command to discard MCMC draws - that would let you see how correlated more distant draws are. For example if using thin=50, the 50-th autocorrelation will become the first.

3. The thin option in the data imputation command woks as you describe above.

4. First you should make sure that Mplus does what you think it does - look at slide 184 in
http://statmodel.com/download/Topic9-v52%20%5BCompatibility%20Mode%5D.pdf

By default all variable in Mplus are present on both levels within and between that accounts for the similarity of SES within clusters (The command that restrict that default are within= and between=).
 Lindsay Bell posted on Wednesday, May 09, 2012 - 12:35 pm
Thank you for your reply. Just to make sure I understand - if I specify

ANALYSIS:
TYPE = BASIC TWOLEVEL;

then the similarity of variables within clusters is accounted for, unless I list then as WITHIN.

If I want to specify an H1 model on the within level and an HO model on the between level, do I just not specify any model for the within level? If variables are listed as part of the between-level H0 model, is their cluster-level similarity still accounted for?

Thank you,
Lindsay
 Lindsay Bell posted on Wednesday, May 09, 2012 - 3:46 pm
Sorry, a couple more questions - how do I evaluate the model when using TYPE = TWOLEVEL BASIC? The program isn't giving me the Bayesian plots, so I don't know how to assess whether the estimates reached a stable pattern or if there is an issue with autocorrelation.

Also, the model is converging and not giving me any error messages even when there are more between-level variables than there are clusters. Can I be comfortable with the results?

Thank you very much,
Lindsay
 istia posted on Thursday, May 10, 2012 - 9:06 am
what's exactly the role of random value chi square for regression model or even multiple imputatation? anybody can share some papers? thanks before
 Tihomir Asparouhov posted on Thursday, May 10, 2012 - 11:51 am
Lindsay

If you have more variables than clusters you should be using the H0 imputation method (right most path in diagram on slide 184). Like this

TYPE = TWOLEVEL; estimator=bayes;

model
%within%
y1-y100 with y1-y100;
%between%
y1-y100;

Add the data imputation command.

data imputation:
impute=y1-y100;
save=imputations*.dat;
 Linda K. Muthen posted on Thursday, May 10, 2012 - 1:21 pm
Istia:

What is "random value chi square".
 Lindsay Bell posted on Thursday, May 10, 2012 - 4:25 pm
Tihomir - Thank you so much for your reply with the example syntax. That is very helpful. One follow-up question: does it make a difference that a few of the variables are observed at the between level and have no within-cluster variance? Would that change the imputation syntax at all?

Thank you,
Lindsay
 istia posted on Thursday, May 10, 2012 - 9:11 pm
Linda - Sorry if i was wrong or understading it. But what i've read from here:
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_multiple_imputation_univariate_linear.htm

is there will be a random value 'u' of Chi Square. I just can't get it what is it exactly mean.
But what's the influence for using this Chi Square value to produce some kind of regression value or imputation value I mean?
 Tihomir Asparouhov posted on Friday, May 11, 2012 - 9:03 am
Lindsay

You should specify those variables on the between= list in the variable command.
 Tihomir Asparouhov posted on Friday, May 11, 2012 - 9:04 am
istia

Take a look at

http://statmodel.com/download/MI7.pdf
 Lindsay Bell posted on Friday, May 11, 2012 - 9:06 am
Ok, thank you, I will do that. Just to be clear, though, the model syntax:

model
%within%
y1-y100 with y1-y100;
%between%
y1-y100;

will be exactly the same, even if variables y95-y100 are between? It just seems strange to me to have variables that are between variables as part of the within section in the model.

Thank you,
Lindsay
 Bengt O. Muthen posted on Friday, May 11, 2012 - 9:35 am
Istia -

Regarding "random value 'u' of Chi Square", I think you should ignore that. That's just the way they describe the chi-square testing. The way we describe the chi-square testing is in the document Tihomir pointed to.

If you want to learn moe about multiple imputation, I would recommend the 2010 book by Craig Enders. It refers to Mplus.
 Lindsay Bell posted on Friday, May 11, 2012 - 9:51 am
As a follow-up, I just tried the syntax, and got the error message "between variables cannot be used on the within level."

Instead I tried:

model
%within%
y1-y94 with y1-y94;
%between%
y1-y100 with y1-y100;

But this model is not converging, I'm guessing because there are too many parameters relative to the number of clusters. Perhaps instead I should try this within syntax and the between-level analysis model with reference to the between-level variances of other variables? i.e.,

model:
%within%
y1-y94 with y1-y94;

%between%
y1-y91;
y92 ON y95 y96 y97 y98 y99 y100;
y93 ON y95 y96 y97 y98 y99 y100;
y94 ON y95 y96 y97 y98 y99 y100;


I really appreciate your guidance.

Lindsay
 Lindsay Bell posted on Friday, May 11, 2012 - 10:16 am
I just tried the syntax with the between-level analysis model specified:

model:
%within%
y1-y94 with y1-y94;

%between%
y1-y91;
y92 ON y95 y96 y97 y98 y99 y100;
y93 ON y95 y96 y97 y98 y99 y100;
y94 ON y95 y96 y97 y98 y99 y100;

and it converged very well. Everything looks good except that the between-level variance parameters for all the variables except y92, y93, and y94 have very high autocorrelations. Does this indicate a problem with the imputation model, or do I just need to increase the thinning until the autocorrelation drops to near zero?

Thank you again,
Lindsay
 Tihomir Asparouhov posted on Friday, May 11, 2012 - 11:17 am
You don't need to have the autocorrelation drop to 0.Instead you can aim for the 30-th autocorrelation to be below 0.2 or 0.3. Try thin=10 or even 50 or 100.
 Laura Baams posted on Monday, May 21, 2012 - 1:16 pm
Hi,

We have a question about planned missingness and FIML. We have a large dataset (N = 1400) and have worked with the three form design for a large part of a large questionnaire. This means we have a lot of missings, but only MCAR.

We plan on using FIML to deal with the missings, but have noticed that Mplus does not give all fit statistics. We do not get the RMSEA, CFI, TLI, Chi.

We were told that currently Mplus has no way of reliably estimating these fit statistics, and therefore does not give them. Is this indeed the case? Do you know of any papers that discuss this issue? And is there away around this issue without using MI?

Thanks for any tips or advice!
 Mauricio Garnier-Villarreal posted on Monday, May 21, 2012 - 2:16 pm
Hi

I am running a multiple imputation in mplus, but i have run into the problem that the data set is big (1119 variables, arround 5000 subjects) and mplus tells me that i can not include more than 500 variables. I can get the imputation to run when I select some variables to be imputed with the USEVARIABLES command. But, when i do this am i excluding all the other variables from the imputation process?

how can i impute big data sets in mplus?


thank you
 Bengt O. Muthen posted on Monday, May 21, 2012 - 8:42 pm
Laura,

You don't have to use MI (multiple imputation) which it sounds like you are doing given that you don't get all fit indices (which haven't been statistically developed yet). With missing by design you might instead want to use multiple-group ML analysis, with groups corresponding to the three forms.

A good applied source for missing data handling is the C. Enders 2010 book.
 Bengt O. Muthen posted on Monday, May 21, 2012 - 8:49 pm
Mauricio,

There are 3 lists of variables: The NAMES list which describes the data (it can contain more than 500 variables), the USEV list which are the variables that inform the imputations, and the IMPUTE list which says which subset of variables we want to have imputations from.

Typically, your USEV list is much shorter than your NAMES list. You don't need all the NAMES list variables to inform the multiple imputations, but usually a very short list of variables. The IMPUTE list should contain a shorter list of the variables for which you want to do a particular analysis. So 500 USEV variables would seem to be more than enough.
 Lindsay Bell posted on Friday, June 01, 2012 - 7:45 am
Hi -

I am using 20 imputed data sets to do a two-level analysis. I've discovered that under certain circumstances, the program is not analyzing all of the data sets (the output says requested: 20, completed: 19).

One scenario in which this is happening is with a between-level dichotomous variable that was completely observed, and so is identical in every imputed data set. Can you help me figure out why that might be happening and what I can do to recover the 20th data set?

Thank you,
Lindsay
 Linda K. Muthen posted on Friday, June 01, 2012 - 9:57 am
Add TECH9 to the OUTPUT command to see the reason.
 Natalie Bohlmann posted on Wednesday, June 06, 2012 - 3:24 pm
Hello
I am having a similar issue to a previous post. I am using 10 imputed data sets, in a single level model. There are 177 cases in the data set, but the output says that the average number of observations is 141. The number of replicated requested is 10, but only 8 completed. My output also does not provide sample stats or a Chi-square test result.

I added Tech9. It says that the model terminated normally gives the warning: THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX....PROBLEM INVOLVING PARAMETER 17.

Parameter 17 is the covariance of a variable with itself from the PSI matrix.

I examined each of the imputed files, all have complete data for all 177 cases. I've also confirmed that my implist.dat file lists all 10 data sets correctly named.

I am using version 6.0. I have a friend with the 6.11 update and asked her to run the model. The result is the same, her output has an addition message above the estimates for model fit "THE CHI-SQUARE COULD NOT BE COMPUTED. THIS MAY BE DUE TO AN INSUFFICIENT NUMBER OF IMPUTATIONS OR A LARGE AMOUNT OF MISSING DATA."

Her output does include sample statistic, but states that it is only for 8 data sets.

Please Advise. I appreciate your time and help.

Thank you,
Natalie
 Linda K. Muthen posted on Wednesday, June 06, 2012 - 4:22 pm
Please send the relevant files and your license number to support@statmodel.com.
 Mauricio Garnier-Villarreal posted on Tuesday, June 12, 2012 - 6:31 am
Hi

I have a question on how Mplus combine the results from multiple imputed data sets.

Does Mplus combine both unstandardized and standardized results? or just combine the unstandardized and standardized the combine results?


thank you
 Linda K. Muthen posted on Tuesday, June 12, 2012 - 8:50 am
The average of the parameter estimates over the multiple imputations are standardized.
 Steven De Laet posted on Tuesday, June 19, 2012 - 2:13 am
Hi,

We have a question about fit statistics (CFI, ...) when applying multiple imputation.
Because no formal pooling rules are currently available, Enders (2010)recommends that the 20 (or more) estimates of each fit index are used to create an empirical distribution.
I wonder how I'd best do that. Should I run my syntax on every imputed dataset and save each fit statistic manually or is there a more efficient way (perhaps a summary of the fit statistics of each dataset in one output)?

Thank you very much
 Linda K. Muthen posted on Tuesday, June 19, 2012 - 1:58 pm
You would need to run your syntax on each imputed data set.
 Chris Blanchard posted on Friday, June 22, 2012 - 10:12 am
Hi,
i'm running a latent class growth analysis on 5 time points with missing data. I've been able to use the multiple imputation procedure nicely to establish there are 2 classes, however, I want to save class membership (i.e., cprobabilities) from this analysis and MPlus is telling me i can't. Is there a way I can get the class membership for each patient exported from MPLus when using the multiple imputation method? (please note there are 5 data sets)

chris
 Bengt O. Muthen posted on Friday, June 22, 2012 - 8:38 pm
For cprobs you need an estimated model and data for the subject in question. You have the estimated model (the average over the imps), but which of the 5 data sets should you use? That's the problem. You can create cprobs for each data set using the average estimates (fixed parameter values when running with each data set), and then average the cprobs. It is not clear, however, that this is the best approach - we are at the research frontier here.
 bkeller posted on Friday, July 27, 2012 - 1:25 pm
I'd like to determine the fraction of missing information (see, e.g., Snijders & Bosker, 2nd ed, p. 141) for the parameters in a TWOLEVEL RANDOM type analysis for a TYPE = IMPUTATION run. Is there a way to output the average within-data-set variance W.bar = 1/M*Sum(SE^2) and the between-imputation variance B = (1/(M-1))*Sum(theta.hat_m-theta.bar)^2 for each parameter so that I can calculate this?

Thank you!
 Tihomir Asparouhov posted on Friday, July 27, 2012 - 3:28 pm
This will be included in the output in the next Mplus version but you can still compute that with the current version. If you have just a small number of imputation data sets just run all of them one at a time and you can get all the parameter estimates and compute this by hand. If you have many imputed data sets use the External Montecarlo technique described in the Users Guide EXAMPLE 12.6 STEP 2. This way you can get the between imputation variance (column 3) and the average SE (column 4).
 bkeller posted on Monday, August 06, 2012 - 1:39 pm
I used the MONTECARLO technique you described to get B and W.bar, thank you. I am further interested in saving as output an array which contains the actual estimated parameter values across imputations. I am using 25 imputed datasets so I would rather not run each one and compile by hand. Is there a way to ask Mplus to save the results (something similar to SAVEDATA: ESTIMATES = file.dat;) but for all 25?
 Tihomir Asparouhov posted on Tuesday, August 07, 2012 - 12:34 pm
Use
savedata: results=file.dat;
 EFried posted on Thursday, August 09, 2012 - 12:45 pm
I'm using multiple imputation and am running a multilevel growth model.

In my model results, baseline covariates are shown to be correlated by 0.000 (which is not actually the case in the dataset). The correlations between time-varying covariates are, in contrast, shown as expected.

All other model results look fine as well.

My question is whether the zero correlations have something to do with the way MPLUS deals with multiple imputation files – or whether something is wrong with the analysis?

Thank you
 Linda K. Muthen posted on Thursday, August 09, 2012 - 2:04 pm
Please send the output and your license number to support@statmodel.com so I can see what you are doing.
 Maren Formazin posted on Thursday, September 06, 2012 - 4:13 am
Hi,
I have ten multiple imputation data sets which is my data for CFA. Since my data is categorical, I have used WLSMV and specified the variables as categorical.

For Chi-Square, there is the following information:

Chi-Square Test of Model Fit

Number of successful computations 10

Proportions Percentiles
Expected Observed Expected Observed
0.990 1.000 0.020 39.184
0.980 1.000 0.040 39.184
0.950 1.000 0.103 39.184
0.900 1.000 0.211 39.184
0.800 1.000 0.446 39.184
0.700 1.000 0.713 40.065
0.500 1.000 1.386 53.839
0.300 1.000 2.408 62.245
0.200 1.000 3.219 68.881
0.100 1.000 4.605 105.435
0.050 1.000 5.991 105.435
0.020 1.000 7.824 105.435
0.010 1.000 9.210 105.435

What does this output mean? With ML, there is just one Chi-square value - here, there is none.

Thank you very much for your help!
 Maren Formazin posted on Thursday, September 06, 2012 - 4:14 am
Hi,

and an additional question:

Are the mean of CFI / TLI / RMSEA / WRMR the pooled results over all 10 datasets? What do "proportions" and "percentiles" of these indices mean?

Thank you!
 Linda K. Muthen posted on Thursday, September 06, 2012 - 5:58 am
The only fit statistic that has been developed for multiple imputation is chi-square. For the others we give the average over the imputed data sets. The output is described on page 362 of the user's guide.
 Maren Formazin posted on Friday, September 07, 2012 - 12:44 am
Thanks for your reply, Linda.
I have checked my output again - there are NO mean or standard deviation, just the information I've posted above ("10 successful computations" and the table with expected and observed percentiles and proportions). Do I need to specify something more?
I do get mean and SD für CFI, RMSEA, TLI and WRMR though.

On the other hand, when using ML with the same imputed data, there is just one chi-square-value (and CFI, RMSEA etc.), but no table with expected and observed proportions and percentiles. Why does this happen?

Thanks!
 Linda K. Muthen posted on Friday, September 07, 2012 - 5:48 am
Please send the two outputs and your license number to support@statmodel.com so I can see what you are looking at.
 Caroline Vancraeyveldt posted on Friday, November 09, 2012 - 6:47 am
Dear Dr. Muthén,

I am doing multiple imputation for four variables in a latent growth model. When I request 20 replications, only 19 are completed. I requested TECH9, but apparently only the first repliction did not yield convergence and that the number of iterations was exceed (no reason why was stated in TECH9). Could it be that one of the variables has too many missings to do imputation? (data of 83 out of 175 are missing for this variable). If I do not include this variable in the missing analysis, than all the iterations are correctly completed.

Thank you for your response!
 Linda K. Muthen posted on Friday, November 09, 2012 - 11:26 am
Please send the output, data, and your license number to support@statmodel.com.
 Sofie Henschel posted on Tuesday, November 13, 2012 - 10:11 am
Hi,
I am running a multiple imputation with continous variables. I used the rounding option and requested 5 decimals for the imputed variables because my original values have 5 decimals too.

ROUNDING = sach lit (5);

unfortunately mplus gives only the default of 3 decimals for the imputed data and reduces the not imputed values to 3 decimals. have you any idea what's wrong here?
thanks in advance
Sofie
 Tihomir Asparouhov posted on Tuesday, November 13, 2012 - 2:01 pm
Add

savedata: Format=F10.5;
 Sofie Henschel posted on Wednesday, November 14, 2012 - 1:20 am
Thanks for your help, but mplus says that doesn't work for multiple imputation.

*** WARNING in SAVEDATA command
The FORMAT option for saving data is not available for TYPE=MONTECARLO or multiple imputation. The FORMAT option will be ignored.

Have you any other suggestion? Thanks
Sofie
 Kofan Lee posted on Wednesday, November 14, 2012 - 7:23 am
I am using multiple imputation in CFAs. I use CFA to check theoretical constructs and for the preparation of item parceling. I have confronted some questions, and I wonder if I have something wrong with syntax.
It seems like such combination (CFA with impute data) brings some limitations. For instance, the survey data I have is non-normally distributed, However, I cannot use SAVE=MAHA and tech13 to detect multivariate outliers and multivariate normality. Further, modification indexs cannot be administered too. Also, chi-square test in using MLR estimation does not generate the output of significant test. Would you mind taking a look at my command as followed:

Title: CFA motivation using imputated data;
Data: File = C:\Users\koflee\Desktop\111312\serious.imputelist.dat;
Type = imputation;
VARIABLE:
NAMES ARE m1-m19 s20-s37;
USEVARIABLES ARE m1 m2 m4 m6-m19;
ANALYSIS: ESTIMATOR = MLR;
MODEL: im by m4 m10 m15 m18;
id by m8 m14 m17;
intro by m2 m7 m13;
em by m1 m6 m11 m16;
am by m9 m12 m19;
SAVEDATA: save=maha;
OUTPUT: TECH4 tech13 STANDARDIZED RESIDUAL MODINDICES (0);

Thank you so much
Kofan
 Tihomir Asparouhov posted on Wednesday, November 14, 2012 - 9:04 am
Sofie

It should have worked. Are you using type=basic? Send your example to support@statmodel.com

Tihomir
 Linda K. Muthen posted on Wednesday, November 14, 2012 - 12:56 pm
Kofan:

Some things have not yet been developed for multiple imputation, for example, only the ML estimator has been developed. For other estimators we give the mean and other information.
 Kofan Lee posted on Thursday, November 15, 2012 - 6:11 am
Linda,

Thanks for the response. I actually have very few missing responses and decide to delete those cases. I have a question about Mahalanobis D. I try to run a CFA (ML estimation) with the following code:
SAVEDATA: save=maha;
However, this command is ignored by MPlus. Should I add something in this syntax?

Thank you and have a wonderful day

Kofan
 Linda K. Muthen posted on Thursday, November 15, 2012 - 9:05 am
You also need to name a file using the FILE option.
 Kofan Lee posted on Saturday, November 17, 2012 - 6:02 am
Linda,

That works. Thank you
 Jan Zirk posted on Thursday, November 29, 2012 - 5:13 pm
Dear Bengt or Linda,
I would like to ask you about imputation in context of Bayesian plausible values. In a dataset with a big sample size (n>10000) there are many ordered-categorical variables from 5 instruments + demographic measures. To decrease computational demand I would like to transform the ordered-categoricals to continuous plausible value measures. Is it better to do this via one big "H1 model" (ie, "** with **" where ** means all variables in the dataset) or would it better to run 5 separate H1 models (separate for each instrument's categoricals)?

Best wishes,
Jan
 Jan Zirk posted on Thursday, November 29, 2012 - 5:15 pm
P.S. I would like to next run SEMs with all measures entered.
 Bengt O. Muthen posted on Friday, November 30, 2012 - 7:55 am
It sounds like you are putting a factor behind each ordered-categorical variable. If you can do it with all the variables from all 5 instruments that would be best assuming they are at least moderately correlated. But if that gives you too many variables, then assuming you have enough variables within instrument doing it instrument-wise would seem ok too. I guess any combination of the different sets of plausible values for the 5 instruments is equally valid.
 Jan Zirk posted on Friday, November 30, 2012 - 8:08 am
Thank you Bengt! It seems than like a topic worth an article/short note. "It sounds like you are putting a factor behind each ordered-categorical variable" - exactly, I wanted to extract them with LRESPONSES.
One more question, to best reflect the original underlying data structure, if I run such an H1 model on all the available variables & see in its output that e.g. a few links are ns, do you think that it would be worth effort to trim such links and in the next step extract plausible values from the backwards-deleted/trimmed version of the H1 model? or rather extract them regardless the ns connections?
 Bengt O. Muthen posted on Friday, November 30, 2012 - 8:31 am
My guess is that it wouldn't be worth the effort.
 Jan Zirk posted on Friday, November 30, 2012 - 9:09 am
Yes, thank you; this is what I thought.
Best wishes,
 Kätlin Peets posted on Friday, January 04, 2013 - 3:07 am
I use multiple imputation to handle missing data. Can I interpret model fit indices as usual? (in the analysis phase)
Thank you!
 Linda K. Muthen posted on Friday, January 04, 2013 - 6:25 am
No except for ML. We report the average over the imputations for the other fit statistics which have not yet been developed for multiple imputation.
 Kätlin Peets posted on Friday, January 04, 2013 - 8:54 am
Thank you. Could you please specify. Do you mean that when I use ML estimator, fit indices are interpretable or are they still averages?
 Linda K. Muthen posted on Friday, January 04, 2013 - 9:51 am
With ML, the chi-square value is interpretable not the other fit statistics. The other fit statistics are averages.
 Lies Missotten posted on Thursday, January 24, 2013 - 3:51 am
“Dear Linda/Bengt,
I conducted multiple imputation analyses in a (relatively small) sample of 175 person.
Next, I estimated a longitudinal path model without problems. However, when I explicitly model a correlation between the predictors, I received the following error message: “THE BASELINE CHI-SQUARE COULD NOT BE COMPUTED. THIS MAY BE DUE TO AN
INSUFFICIENT NUMBER OF IMPUTATIONS OR A LARGE AMOUNT OF MISSING DATA.”
However, if I increased the number of imputations, I still received the same error message. I do not receive this error message if I do not explicitly model the correlation between the predictors. What could be the reason for that, please?
Thank you in advance!”
 Linda K. Muthen posted on Thursday, January 24, 2013 - 11:58 am
Please send the two outputs (with and without the correlation) and your license number to support@statmodel.com.
 Shin, Tacksoo posted on Sunday, March 03, 2013 - 5:16 pm
Dear Linda,

I have questions related to MI with Bayesian method.

What combining rules did Mplus use especially with random effect model (latent curve model)? In the case of fixed effects, Rubin (1987) presented the method for combining results from a data analysis performed m times. Or are alternatives imposed? (e.g., jackknife variance estimator? Fractionally weighted imputation?)

Can these combination formulas be used with nonlinear models?

How about posterior predictive p-value? Is it same with "combining rules of Likelihood Ratio Test" or "Wald test", which Asparouhov and Muthen (July, 27, 2010) explained in "Chi-Square Statistics with Multiple Imputation: Version 2"?
 Linda K. Muthen posted on Sunday, March 03, 2013 - 5:55 pm
Parameter estimates are averaged over the set of analyses. Standard errors are computed using the average of the squared standard errors over the set of analyses and the between analysis parameter estimate variation (Rubin, 1987; Schafer, 1997). A chi-square test of overall model fit is provided (Asparouhov & Muthén, 2008c; Enders, 2010).

All other values are averaged over the set of analyses.
 Shin, Tacksoo posted on Sunday, March 03, 2013 - 6:14 pm
Dear Linda,

Thank you for your quick reply.

You mean PPP is also averaged over the set of analyses? If so, is there any possibility of bias?
 Linda K. Muthen posted on Sunday, March 03, 2013 - 8:13 pm
Yes, that is also an average. I don't believe there is any theory as to how this should be combined.
 Shin, Tacksoo posted on Sunday, March 03, 2013 - 9:16 pm
Dear Linda,

Deeply appreciate your help.
 Eric Deemer posted on Wednesday, March 27, 2013 - 1:25 pm
Hello,
I ran an analysis on 5 imputed data sets but get no output. I used "type = imputation" in the DATA command also. The computation window flashes on the screen and that is all. Is there something I am doing wrong?

Thanks,
Eric
 Linda K. Muthen posted on Wednesday, March 27, 2013 - 2:32 pm
Please send the files and your license number to support@statmodel.com.
 Maren Formazin posted on Thursday, March 28, 2013 - 8:54 am
Dear Linda & Bengt,

when using TYPE = IMPUTATION and 10 imputed datasets, there are only residual covariances. Is there an option to get residual correlations as well? If so - which command would I have to use?

Thanks for your help!
 Bengt O. Muthen posted on Thursday, March 28, 2013 - 4:14 pm
There is not an option for also getting the residual correlations.
 Ping Li posted on Thursday, April 04, 2013 - 1:39 pm
Hi Linda,

I use imputation data sets that I have imputed to do analysis. When I run the syntax, just as Eric Deemer above mentioned, the computation window flashes on the screen but no output file.
Could you help to see what is wrong with the syntax:
TITLE:
Public Administration;

DATA:
FILE=imputelist.dat;
TYPE=imputation;

VARIABLE:
NAMES ARE
ciserq1-ciserq8 ceserq1-ceserq5
agency gen age gender cltype clgend clpart clint
educ ethnic tenure jobpos toint car1-car4 hr1-hr11 lmx1-lmx7
tiserq1-tiserq8 teserq1-teserq5;

USEVARIABLE
ciserq1-ciserq8 ceserq1-ceserq5
gen
hr1-hr11;

ANALYSIS:
ESTIMATOR=ML;

MODEL:

ciserq BY ciserq1* ciserq2-ciserq8;
ciserq@1;

ceserq BY ceserq1* ciserq2-ceserq5;
ceserq@1;


hr BY hr1* hr2-hr11;
hr@1;

ciserq ceserq on hr gen;


OUTPUT: STANDARDIZED(stdyx);

Thanks very much!
 Linda K. Muthen posted on Thursday, April 04, 2013 - 1:59 pm
Please send the files and your license number to support@statmodel.com.
 Maren Formazin posted on Friday, April 12, 2013 - 4:41 am
Hi,

my dataset contains missing values. I have completed multiple imputation with m = 10 and use Mplus to estimate structural models over the 10 imputed datasets.

Additionally, I have used the original data with MISSING = BLANK to estimate the same models.

All model fit indices based on analyses with the dataset that still contains missing data are substantially better than those based on the 10 imputed datasets. I've been wondering which algorithm Mplus uses when analyzing data with missing values that explains these differences.

Thank you.
 Linda K. Muthen posted on Friday, April 12, 2013 - 9:07 am
The only fit statistic that has been developed for multiple imputation is chi-square for maximum likelihood. The means are reported for all other fit statistics.
 Yalcin Acikgoz posted on Sunday, April 14, 2013 - 1:25 pm
Dr. Muthen,

I am working on a multiple imputation procedure and in accordance with your Asparouhov & Muthen (2010) paper which states

"The missing data is imputed after the MCMC sequence has converged",

I am trying to run an MCMC sequence.

Even though I am using Mplus v7 (which should be able to run Bayesian statistics), I am getting

" Unrecognized setting for ESTIMATOR option:
BAYES".

Why do you think this is happening?
 Linda K. Muthen posted on Sunday, April 14, 2013 - 4:15 pm
I think you are not using Version 7. Check at the top of the output where it shows which version you are using.
 Yalcin Acikgoz posted on Monday, April 15, 2013 - 6:51 am
Dr. Muthen,

Thank you very much for your prompt response. I checked and you are right; it shows version 5.1. But this is weird because I can see the little Mplus7 sign on the top left corner of my window when I am using the program.And when I check the About MPlus section it says Mplus version 7. Do you think this has something to do with settings ?
 Linda K. Muthen posted on Monday, April 15, 2013 - 8:38 am
You must have more than one Mplus.exe on your hard drive. So a search and delete all but the most recent.
 Yalcin Acikgoz posted on Monday, April 15, 2013 - 9:37 am
Thank you Dr. Muthen, that worked.

Yalcin
 Yalcin Acikgoz posted on Sunday, April 21, 2013 - 1:50 pm
Dr. Muthen,

I have one other issue that I have been coming across. Please see the line of syntax below:

missing are Q51A-Q51F (-100) Q52A-Q52E (-99 98) Q53A-Q53D (98 -99);

I am doing multiple imputation and above syntax is where I define which values need to be imputed. When I use the syntax above, for Q52 variables it does not recognize -99 as a missing value flag. However, if I change the order that I type in such that -99 comes first and 98 comes after, it does what it is supposed to be doing. Same thing happens for other variable series as well. Somehow it does not function properly when I write 98 before -99.

Do you have any idea why this might be happening?

Thanks in advance!
 Linda K. Muthen posted on Sunday, April 21, 2013 - 3:09 pm
Yes, you need to put the negative number first. If it is second, the minus sign is read as a list.
 Maren Formazin posted on Friday, April 26, 2013 - 5:13 am
Dear Linda,

getting back to my post from April, 12th - is it possible to get the separate results for CFI, RMSEA etc. for the different imputations?
Why would the mean for CFI, RMSEA indicate better fit in models with FIML than in models with imputed data? How does Mplus estimate parameters when there is missing data?

Thank you very much.
 Bengt O. Muthen posted on Friday, April 26, 2013 - 2:19 pm
Q1. You would have to run each imputation as a separate data set.

Q2. Don't know off-hand. It's a research question; a dissertation topic someone? I would not trust average CFI or RMSEA from imputations unless it's been researched because those measures "don't know" that the analysis of each imputed data set is based on imputed data. When you do CFI/RMSEA by FIML, they "know" that data are missing because the chi-square on which they are based knows.

Q3. FIML is done the regular way of ML assuming MAR. Bayesian imputation assumes the same thing.
 Yalcin Acikgoz posted on Tuesday, April 30, 2013 - 12:09 pm
Dr. Muthen,

I am imputing the data with MPlus. Examining the imputed data, I found that even though the original data is on a 1-5 scale, imputed data included values smaller than 0 and bigger than 6. Should I be concerned or can I simply use the highest (or lowest) possible value for those out-of-range cells?

Thanks!
 Linda K. Muthen posted on Tuesday, April 30, 2013 - 1:10 pm
See the VALUES option of DATA IMPUTATION. I think this may be what you want.
 Yalcin Acikgoz posted on Tuesday, April 30, 2013 - 1:52 pm
This was helpful, thanks!
 Yalcin Acikgoz posted on Tuesday, April 30, 2013 - 9:28 pm
Dr. Muthen,

I am new to both MPlus and MI and this is why I have so many questions. There is another issue that I am struggling with. I just discovered that even though my original sample size is 3000+, the imputed data come with a sample size of 1882. I reviewed the help contents, this forum, and the user guide but I couldn't find any explanation to why this is happening. My hunch is that this might be a memory issue, but I don't know. Can you tell why this might be happening?

Thanks!
 Linda K. Muthen posted on Wednesday, May 01, 2013 - 7:45 am
You are probably reading your original data incorrectly by having blanks in the data set and using free format or by having too many variable names in the NAMES list. If this does not help, send the data, output, and your license number to support@statmodel.com.
 Maren Formazin posted on Tuesday, June 11, 2013 - 7:19 am
Dear Linda,

for my model, I use the following command:

TYPE = IMPUTATION;

I have 10 imputed datasets (with no missing values).
When estimating a model with four latent factors, model estimation proceeds normally. However, when trying to establish a 2nd order factor (F2 BY F11* F12 F13 F14; F2@1), I get the following message:

"The chi-square could not be computed. This may be due to an insufficient number of imputations or a large amount of missing data."

There definitely is no missing data. Why would 10 imputations not suffice? All other models worked well - as did the same model with a different dataset.

Thanks for your help!
 Linda K. Muthen posted on Tuesday, June 11, 2013 - 11:42 am
You can try increasing the number of imputations. If that does not help, send the output and your license number to support@statmodel.com.
 Mijke Rhemtulla posted on Wednesday, June 12, 2013 - 8:40 am
Multiple Imputation in 7.1 produces a new column of results called "rate of missing". Can you tell me what this refers to and how it's computed? I was hoping it was fraction of missing information, but the values don't match my hand calculations and I can't find it in the Guide. Thanks much!
 Bengt O. Muthen posted on Wednesday, June 12, 2013 - 9:28 am
It is the same as fraction of missing information. How do you calculate it?
 Mijke Rhemtulla posted on Friday, June 14, 2013 - 8:39 am
I use FMI = Vb/Vt, where Vb is between-imputation parameter variance, and Vt is total parameter variance (this definition is in Bodner 2008, Schafer 1997, Enders 2010). Does Mplus use Schafer's rate of missing information (defined here: http://sites.stat.psu.edu/~jls/mifaq.html#minf)?
 Tihomir Asparouhov posted on Friday, June 14, 2013 - 9:00 am
Yes
 Alvin Wee posted on Thursday, June 27, 2013 - 6:02 am
Hi, the new column "rate of missing" in version 7 is FMI right? How to we make use of it (i.e., how do we make use of them to indicate the quality of imputations)? Is there a range or cut off?
 Alvin Wee posted on Thursday, June 27, 2013 - 8:32 am
Also in the output, the unstandarised estimates = the values in "rate of missing" column. What does this mean?
 Linda K. Muthen posted on Thursday, June 27, 2013 - 11:19 am
Please send the output and your license number to support@statmodel.com.
 Alysia Blandon posted on Friday, July 05, 2013 - 10:50 am
I am planning on running a H1 imputation model to save the imputed data sets. I have data clustered within families and was going to use the TYPE=BASIC TWOLEVEL. What I am having trouble figuring out is whether the data needs to be in wide or long format (it is currently one row per family).
 Linda K. Muthen posted on Saturday, July 06, 2013 - 6:39 am
You would not need TWOLEVEL if your data are in wide format and the cluster variable is family. You would just need TYPE=BASIC. Multivariate analysis takes care of any nonindependence of observations.
 Yan Liu posted on Thursday, July 11, 2013 - 6:11 pm
Dear Dr. Muthen,

I did multiple imputation and conducted a mediation analysis with 10 imputed data sets. However, I can only see 9 data sets were analyzed and the model fit indices were also not given in the output. I wonder if there is anything wrong with my Mplus code.

Here is my imputation code:
USEVARIABLES ARE Y1t1 – M1t2 ;
AUXILIARY = schid tchid stdid x1 x2 sex age;
MISSING = blank;
ANALYSIS:
type = basic;
DATA IMPUTATION:
impute = Y1t1 – M1t2 ;
ndatasets = 10;
save = imput*.dat;
OUTPUT: TECH8;

This is my mediation analysis code

USEVARIABLES ARE x1 x2 Y1t1 – M1t2 M_d Y1_d Y2_d ;

DEFINE: M_d=M1t2 –M1t2; ! Difference score of mediator between time 1 and time 2
Y1_d=Y1t2 –Y1t1; ! Difference score of Y1
Y2_d=Y2t2-Y2t1; ! Difference score of Y2

MODEL:
M_d ON x1 (a1);
M_d ON x2 (a2);
Y1_d ON M_d (b1);
Y2_d ON M_d (b2);
Y1_d ON x1 x2;
Y2_d ON x1 x2;

MODEL CONSTRAINT:
NEW(indb1 indb2 indb3 indb4);
indb1=a1*b1;
indb2=a1*b2;
indb3=a2*b1;
indb4=a2*b2;

Here is what I saw from the output of my analysis:

SUMMARY OF ANALYSIS

Average number of Observations 228
Number of replications Requested 10
Completed 9
 Bengt O. Muthen posted on Friday, July 12, 2013 - 2:00 pm
Run each separately and see if one has a problem.
 Yan Liu posted on Monday, July 29, 2013 - 2:25 pm
Dear Dr. Muthen,

Thanks for your suggestion! I tried analyses for each imputed data separately and found all the model data fit was really poor. Is this the reason why I cannot finish running all the imputed data sets?

The data were collected at 2 time points. The total sample size is 228. No missing at time-1, but about 16% missing at time-2. The outcome variables are the percent of time, which were derived from counts. The mediator is continuous. The independent variables are two dummy variables (2 interventions vs. control). The distributions of 2 outcome variable are bi-modal for two groups at both time points. The third group is not that obvious.

I tried two ways to model my data: (1) Using difference scores (time2-time1), the distributions of difference scores are not bi-modal though a little bit skewed. The model fit was found very bad for each imputed data! Then I took a further look at the distributions for each group. Two of them are still bi-modal.

My question is: Should I not use difference score or not use ML estimator? Now the outcome variables are overdispersion and also the bi-modal was still a problem for two of the groups.

(2) Model variables at time-2 include time-1 variables as covariates. Given the bimodal problem and missing issue, what models would you suggest me use?

Thanks!
Yan
 Bengt O. Muthen posted on Tuesday, July 30, 2013 - 8:22 am
So are you saying that the multiple imputation run had only 9 of the 10 converging, but when you ran them separately all 10 converged? Which version of Mplus are you using?

Turning to your key question, poor fit can be due to using ML instead of MLR, but the bi-modality is likely to not be resolved by MLR. I would suggest investigating the cause of the bi-modality. Perhaps you want to simply use a binary variable instead of the bi-modal one?
 Yan Liu posted on Wednesday, July 31, 2013 - 3:45 pm
Dear Dr. Muthen,

Thanks for your suggestion! I think using a binary variable will be easier to solve the problem.

To dichotomize the continuous outcomes, I am thinking to do it in three ways: (1) make a cut-off that separates the two modes (one small and one big distributions), (2) ask for experts' opinion, and (3) run a regression analysis with outcome, predictors, and mediator using latent class analysis (constrain to be 2 classes) and then save the membership.

Will the third option work and be better?
Oh, I used Mplus 6.11. So is any difference between the versions?
Best regards,
Yan
 Bengt O. Muthen posted on Thursday, August 01, 2013 - 8:30 am
The choice between 1-2-3 has to be made by the researcher.

I would recommend always using the latest version of Mplus, which currently is 7.11.
 Yan Liu posted on Friday, August 02, 2013 - 7:51 am
Dear Dr. Muthen,

Thanks a lot! Should I dichotomize outcome variables first and then impute missing data or the other way around?

One more question. When imputing continuous outcome variables (should be zero or positive), I found that some imputed values are negative. Is there a way to constrain the imputed value not to be negative?

Best regards,
Yan
 Bengt O. Muthen posted on Friday, August 02, 2013 - 10:40 am
Personally, I would use max info for imputation so not dichotomize first, but this is your choice.

See the VALUE option on page 518 in the Version 7 UG.
 Lauren Mitchell posted on Tuesday, August 20, 2013 - 1:14 pm
Hello! I am trying to run latent growth curve models with longitudinal data. I have 10-15 waves of data, but unfortunately roughly 30% of participants are missing on my predictor. From my understanding, MI is the best method for handling the missing data on these x-values - does that sound right? I was able to create the imputed data sets, but was not able to take the next step and run the model using the imputed data. The MS-DOS window appeared briefly and disappeared, and no output was produced. If you have any advice, I'd really appreciate it!

Thanks,
Lauren
 Linda K. Muthen posted on Tuesday, August 20, 2013 - 1:25 pm
Please send the input and data sets to support@statmodel.com. If you are not using Version 7.11, try that first.
 Ragnhild Sørensen Høifødt posted on Thursday, September 26, 2013 - 4:53 am
Hello,
I'm new to mplus, and I'm doing a growth mixture model. As I have some missing data in the covariates I wanted to do multiple imputation (Type=Imputation). I see from older posts that I should use starting values to avoid label switching. Does this still apply or do the program automatically use the estimates from the first data set as starting values for the subsequent data sets? I'm using version 7.11.

Thanks,

Ragnhild
 Linda K. Muthen posted on Thursday, September 26, 2013 - 12:14 pm
Instead of multiple imputation I would include the covariates in the model by mentioning their variances in the overall MODEL command and use FIML. In this way distributional assumptions are made about them but cases with missing on one or more covariates are not excluded from the analysis.
 Ragnhild Sørensen Høifødt posted on Friday, September 27, 2013 - 3:38 am
Thanks for the advice! I will try that.
 Michael T Weaver posted on Thursday, October 03, 2013 - 5:59 am
Linda & Bengt:

I appreciate the guidance you've provided for my journey into imputation.

In trying to understand what is going on, and the tests involved, I analyzed a data set of about 300 observations (all continuous scales) in a couple of ways:
(1) FIML using ML, MLR, and Bayes estimators.
(2) Bayes estimation of factor scores, creating 50 imputed data sets, then ML estimation of the latent model.

The FIML estimates all showed poor model fit (no surprise). The ML chi squared results from the imputed data, however, showed good fit (using the chi squared test in the output). That surprised me, given the poor fit using FIML - I was expecting consistent (though not exactly the same) results.

Is the imputed-produced ML test testing something different, or am I missing an additional step? (I read the Multiple Imputation technical paper, version 2, 07/27/2010 - I assume that the output chi squared is the appropriate test of fit.)

I don't want to publish an incorrect interpretation - appreciate guidance to help me avoid that!

Thanks!

Michael
 Linda K. Muthen posted on Thursday, October 03, 2013 - 11:53 am
I believe you are looking at an average chi-square value and a standard error in the multiple imputation output. Is this the case? If so, this is not the true imputation chi-square.
 Michael T Weaver posted on Monday, October 07, 2013 - 2:23 pm
Linda:

If that is what is provided in the output, then yes.

There is a note about average over 50 data sets, but that appears after SAMPLE STATISTICS heading, so I thought it referred only to those.

Here is excerpt from my MPLUS Imputation output:

MODEL FIT INFORMATION

Number of Free Parameters 37

Loglikelihood

H0 Value -619.847
H1 Value -391.050

Chi-Square Test of Model Fit
Value 14.280
Degrees of Freedom 23
P-Value 0.9186

Chi-Square Test of Model Fit for the Baseline Model
Value 168.670
Degrees of Freedom 44
P-Value 0.0000

SRMR (Standardized Root Mean Square Residual)
Value 0.063

I had assumed the ML estimator Chi-squared results produced would reflect the information in the Technical Appendix "Chi-Square Statistics With Multiple Imputation" Version 2.

Do I need to "hand calculate" the appropriate chi squared statistic using these averages?

Thanks.

Michael
 Linda K. Muthen posted on Tuesday, October 08, 2013 - 9:50 am
The values above are not averages. They are the values described in the Technical Appendix "Chi-Square Statistics With Multiple Imputation". They are available only for ML not, for example, MLR. Please send the two outputs, imputation and FIML, along with your license number to support@statmodel.com.
 Deborah Bandalos posted on Wednesday, October 09, 2013 - 8:03 am
My question is about the autocorrelation plots obtained using multiple imputation. I'm not sure what is being plotted on the horizontal axis. No matter how many iterations I specify, the axis runs from 1-30. Are iterations "binned" somehow to create the horizontal axis? If so, is the binning achieved by just dividing the total number of iterations by 30? I tried to change the axis range, but Mplus shut down, so I'm guessing that's not an option.

Thanks,

Debbi
 Linda K. Muthen posted on Wednesday, October 09, 2013 - 10:20 am
Please send the output, graph file, and your license number to support@statmodel.com.
 Maren Schulze posted on Monday, November 18, 2013 - 2:35 am
With my data, I have computed the same SEM twice:

- once with some missing data, using ML
- once with 10 imputed datasets

The chi-square value for the MI-datasets is smaller than the value for the dataset with some missing information (with the same df and N); however, CFI for the MI-dataset is lower than the one for the dataset with some missing information.

Why would this happen?

Thanks for your help!
 Maren Schulze posted on Monday, November 18, 2013 - 4:01 am
In addition to my previous post:

For simulation purposes, I have also used mean imputation and single stochastic regression imputation. Both methods have led to very similar results as ML with missing data.

So it is only the analyses with 10 imputed datasets where chi-square is much lower (around 1275 compared to around 1450 for the three other options - with df= 98 and N = 2326) and CFI is lower, too (.910 compared to .916).

Thanks.
 Bengt O. Muthen posted on Monday, November 18, 2013 - 8:37 am
For your MI approach, are you referring to the chi-squares for each imputed data set, or the one chi-square that summarizes all the imputed data sets? I am referring to techniques discussed on slides 212- of the 6/1/11 Topic 9 handout.
 Maren Schulze posted on Tuesday, November 19, 2013 - 4:39 am
In my output file, there is only one chi-square value which I presume is the one that summarizes all imputed data sets.

(I'm using Mplus 7 with "TYPE = IMPUTATION;" and ten imputed datasets; according to the output, all ten requested replications are completed.)
 Bengt O. Muthen posted on Tuesday, November 19, 2013 - 8:41 am
Please send the output for the 2 runs that you compare to Support.
 Maren Schulze posted on Wednesday, November 20, 2013 - 8:13 am
Thanks for your suggestion, I've done so this morning.

An additional question: I have compared the chi square values of the baseline model between multiple imputation and ML (with missing data) - they differ quite a lot; whereas the differences between chi square baseline model ML (with missing data), mean imputation and single stochastic regression imputation are comparably small. Why would that happen?
 Bengt O. Muthen posted on Wednesday, November 20, 2013 - 6:33 pm
Please send those 2 baseline outputs and data to Support.
 Anonymous posted on Monday, January 06, 2014 - 2:24 am
Hello!
I have calculated CFA with categorical variables with 5 MI datasets (TYPE = IMPUTATION). I ran two CFAs with different models. In order to compare them, can I simply calculate the Chi-square difference test by subtracting the Chi-square values provided in the MODEL FIT part of the outputs?

Thanks in advance!
 Linda K. Muthen posted on Monday, January 06, 2014 - 7:12 am
The chi-square with multiple imputation cannot be used for difference testing. You should use FIML if this is important to your study.
 Lucy Morgan posted on Thursday, January 23, 2014 - 1:23 am
Hi

I am trying to run a fully latent path model (N = 199) with a dataset that is complex (data collected from care assistants, clustered by nursing homes), non-normal distribution, and missing data (< 5%). Data is missing on both exogenous and Endogenous variables. I understand that FIML can account for missing data on endogenous variables only, thus number of observations are reduced to 167 when I run the model. I have a couple of questions I would be very grateful if you could answer:

1) Should I use multiple imputation to compute missing data for ALL variables, and then run the model based on the imputed datasets? Or should I only impute data for the exogenous variables and then run the model with imputed datasets AND FIML? (I did try to impute only exogenous variables, but missing data on the endogenous variables was replaced with * and the model would not run....)

2) When running the multiple imputation, would it be ok to run a straightforward imputation (TYPE = BASIC) as per ex 11.5 in the Version 6.0 handbook or should I be running the multiple imputation with the model that reflects my dataset (TYPE = COMPLEX) similar to ex 11.7?

3) When I run a TYPE = COMPLEX model, I cannot also use MLM (which I need to use to account for non-normal data). Can I simply substitute MLR?

Many (many!) thanks
Lucy
 Linda K. Muthen posted on Thursday, January 23, 2014 - 10:28 am
In your case, I would not use multiple imputation. I would use COMPLEX and MLR and include the variances of all observed exogenous covariates in the MODEL command. They will be treated as dependent variables and distributional assumptions will be made about them. Missing data theory will then be used for them. This is asymptotically the same as doing multiple imputation.
 Jason Edgerton posted on Monday, February 24, 2014 - 11:16 am
Hello,
I have missing data for a 4 wave LGC model I am running, I can't use MLR as an estimator because 2 of my variables are non-normal, I can't use MLM because I have missing values. So I chose to impute 10 data sets (in Mplus 7) and then use MLM estimation --I understand the chi-square and fit statistics are just averages and not accurate evaluations of fit (unless you're using ML), but does MLM still produce robust SEs and parameter estimates with multiple imputed data sets? That is, is it appropriate to use MLM estimation with multiply imputed data if I'm not planning on comparing nested models)?
 Linda K. Muthen posted on Tuesday, February 25, 2014 - 6:33 am
Why can't you use MLR? MLR is robust to non-normality of continuous variables. What do you mean by non-normal?
 Jason Edgerton posted on Tuesday, February 25, 2014 - 1:45 pm
Sorry my mistake re: MLR (by non-normal I mean one continuous predictor and the continuous outcome variable are both highly leptokurtic). When I previously estimated the unconditional LGC model with MLM on imputed data, the fit indices RMSEA, CFI and SRMR all indicated adequate fit, but with MLR on the original data (with missing) these indices all indicate inadequate fit (the parameter estimates are quite similar) -- I assume I have to put more stock in the MLR estimated fit stats and conclude that my model has poor fit --correct?

Thanks
 Linda K. Muthen posted on Tuesday, February 25, 2014 - 2:53 pm
You can't assess fit in multiple imputation using the means of the fit statistics. These are means. How well or poorly they represent fit has not been studied. So yes, it seems your model does not fit.
 Patricia Schultz posted on Monday, March 03, 2014 - 6:09 am
Hi Dr. Muthen,

I'm using multiple imputation (Amelia) and running 10 computations. How do I pool the model fit indices (CMIN/df), CFI, RMSEA, SRMR)? In an earlier post (from 2006) it says to just calculate a simple average of the values (as there is no specific theory on this), I was wondering if this has changed or if that is still the practice.


Thank you.
 Linda K. Muthen posted on Monday, March 03, 2014 - 10:17 am
This is still a research question.
 Shin, Tacksoo posted on Thursday, March 06, 2014 - 12:18 am
Dear Linda,

I have a question about "the concept of multiple imputation using Bayesian estimation".

When imputations are created under Bayesian arguments, MI has a natural interpretation as an approximate Bayesian inference. In addition, I thought that this missing data technique uses Byesian estimation method when obtaining parameter estimates. So, I wrote the below syntax,
--------------------------------------
ANALYSIS:
ESTIMATOR = BAYES;
MODEL:
i s | Y1@0 Y2@1 Y3@2 Y4@3;
[i](a); [s](b); i(c); s(d);i with s(e);
MODEL PRIORS:
a ~ N(190, 20);b ~ N(7, 3);
c ~ IW(616, 5);d ~ IW(8, 5);
e ~ IW(-28, 5);
DATA IMPUTATION:
IMPUTE = Y1-Y4;NDATASETS = 15;
SAVE = C:\*.DAT;
---------------------------------------

By the way, some are confused whether "MI using Bayesian" indicates Bayesian estimation or simply means MI proposed by Rubin. I thought it is the former. Am I correct?

Thank you
 Bengt O. Muthen posted on Friday, March 07, 2014 - 1:57 pm
See page 516 of our UG to see an overview of how imputation works in Mplus. See also the imputation examples in Chapter 11. You can do "H1 imputation" or "H0 imputation". Your setup is an H0 imputation example in line with UG ex 11.7.

Rubin proposed MI using Bayes, so these are one and the same.

I recommend the Craig Ender missing data book.
 Shin, Tacksoo posted on Friday, March 07, 2014 - 4:48 pm
Dear Bengt,

Thank you for your information.

You explained "the data can be imputed from any other model that can be estimated in Mplus with the Bayesian estimator (H0)". Then, the imputed data sets are used in the estimation using Bayesian(Bayesian estimation is used to estimate the model). Does it mean that parameter estimates from Bayesian posterior distributions (for each imputed data) were obtained and combined all? Am I correct?

If so, setting informative priors (model priors) affect imputation step? or estimation process? or both?
 Bengt O. Muthen posted on Friday, March 07, 2014 - 4:54 pm
Q1. Multiple draws were generated from the Bayesian posterior distribution of H0 model estimated parameters and for each draw data were generated.

Q2. The priors only influence the first H0 model estimation step.
 Shin, Tacksoo posted on Friday, March 07, 2014 - 5:50 pm
Dear Bengt,

Deeply appreciate your quick reply.

Here is one last question.

When estimating parameter (in estimation step), Mplus uses simply noninformative priors?

Thank you.
 Linda K. Muthen posted on Monday, March 10, 2014 - 8:33 am
Yes.
 Lindsay Bell posted on Tuesday, March 18, 2014 - 10:34 am
Hello -

I am doing multiple imputation with a Bayesian estimator. I have a few parameters that appear to have autocorrelations still present at 30 lags. How can I see the autocorrelations for lags greater than 30? Both the output and the plot only go through 30.

Also, how can I get the fraction of missing information for each parameter?

Thank you,
Lindsay
 Tihomir Asparouhov posted on Tuesday, March 18, 2014 - 2:39 pm
Using the thin option of the analysis command can give you bigger lags, if you use thin=10 the auto correlations that you will see are essentially 10, 20, 30, ..., 300 so it is multiplied by 10. Alternatively use the BPARAMETERS option of the savedata command to get all parameter values and compute the desired autocorrelation in excel.

The fraction of missing information for each parameter is obtained after the imputations are done as in example 13.13 where the desired model is specified.
 Suzan Doornwaard posted on Tuesday, April 08, 2014 - 4:15 am
Hello,

I am working with a multiple imputed dataset (5 imputed sets) because we employed a 3-form planned missingness design in a large questionnaire. My most important variables are non-normally distributed, so I usually use the MLR estimator for my models.

As I conclude from reading all the information here, the fit statistics (chi-square, RMSEA, CFI) for MLR are
1) all averages over the computed sets
2) these averages are not reliable because Mplus does not "realize" they are from imputed data
3) for the same reason the fit statistics of the separate imputed sets are not reliable either

Now some questions that arise are:
-How do I assess the fit of my model? Should I run it with ML and see if fit statistics here are similar to the averages I get with MLR? That is, as an indication - I don't think actually reporting results of ML models is a good idea since the non-normal distribution of my variables.
-What do I report in a manuscript when I want to refer to model fit (reviewers ask for it)?
-Is there ANY way to test nested models using MLR (constrained vs. unconstrained to test for moderation by gender)? Or any other way to test moderation in this case?

Thank you,
Suzan
 Linda K. Muthen posted on Tuesday, April 08, 2014 - 9:57 am
If you have planned missingness, use FIML not multiple imputation. Then you have fit statistics and can test nested models.
 Suzan Doornwaard posted on Tuesday, April 08, 2014 - 10:21 am
Dear Linda,

Thank you for your reply. Unfortunately we do have to work with the imputed sets.

Could you tell me how I should report on the model fit using TYPE=IMPUTATION in a manuscript? Are there other ways to assess the fit?

And can I use the Wald-test instead for moderation purposes?

Thank you,
Suzan
 Linda K. Muthen posted on Tuesday, April 08, 2014 - 11:22 am
You can use MODEL TEST with multiple imputation. No difference testing can be done. The only absolute fit statistic is chi-square for maximum likelihood for continuous outcomes. See the following paper on the website under Bayesian Analysis:

Asparouhov, T. & Muthén, B. (2010). Bayesian analysis of latent variable models using Mplus. Technical Report. Version 4. Click here to view Mplus inputs, data, and outputs used in this paper.

As far as what others report, I don't know. You might want to ask that on general discussion forum like SEMNET.
 Linda K. Muthen posted on Tuesday, April 08, 2014 - 11:39 am
I'm sorry. This is the paper I meant:

Asparouhov, T. & Muthén, B. (2010). Multiple imputation with Mplus. Technical Report. Version 2.
 Masha Pavlovic posted on Wednesday, April 09, 2014 - 5:56 am
Hello,

I am trying to do imputation of missing data before running ULSMV analysis.

I'm getting an error message
"PROBLEM INVOLVING VARIABLES  AND xC‡ .
REMOVING ONE OF THESE VARIABLES FROM THE IMPUTATION RUN CAN RESOLVE THE PROBLEM."

The problem is that the strange characters are appearing instead of the names of variables and I can not figure out which variables should I remove.

Tnx in advance for the help!
 Linda K. Muthen posted on Wednesday, April 09, 2014 - 10:58 am
Please send the input, data, output, and your license number to support@statmodel.com.
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Enable HTML code in message
Automatically activate URLs in message
Action: