Mplus Discussion >> Multiple Imputation in Mplus 6

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Multiple Imputation in Mplus 6

Mplus Discussion > Missing Data Modeling >

Message/Author

Jon Elhai posted on Tuesday, April 27, 2010 - 6:27 pm

A couple questions about Mplus 6's ability to generate imputed datasets using MI...
1) What types of variables can be used in Mplus 6 for generating imputed datasets? Continuous and categorical variables? What about count variables?
2) Is the Bayes estimator robust to non-normality? Or must I normalize variables before generating imputed datasets in Mplus 6?

Linda K. Muthen posted on Wednesday, April 28, 2010 - 8:24 am

1. For now imputation is available for continuous and categorical variables.
2. No. You should not normalize the variables. Instead you can use two-part modeling, mixture modeling, or treat them as categorical if they have no more than 10 values.

Thessa Wong posted on Thursday, May 20, 2010 - 2:42 am

I have tried to do a multiple imputation in Mplus 6 in which I impute 2 categorical variables. Everything seems to go well, but when I want to do further analyses with the imputed data, I get the following message for each data file:

"Errors for replication with data file P:\Data\Radar\Eigen\SPSS\zdaadn23imp1.dat:

*** ERROR
Unexpected end of file reached in data file.

Errors for replication with data file P:\Data\Radar\Eigen\SPSS\zdaadn23imp2.dat:

*** ERROR
Unexpected end of file reached in data file.

etc.."

What am I doing wrong?

Thank you very much in advance!

Thessa Wong posted on Thursday, May 20, 2010 - 6:22 am

I just figured out what the problem was. I had defined to many variables in the command.
However, that gives me a new problem: I want to use the imputations for analyses in which I also take into account other variables. However, I do not want these other variables to influence my data imputation, but it looks like that including these variables in the data imputation is the only way to get the variables in the new dataset with imputed data. In other words, how do I create a dataset with the imputed variables combined with other variables?

Linda K. Muthen posted on Thursday, May 20, 2010 - 11:32 am

See the AUXILIARY option in the user's guide.

Jon Heron posted on Thursday, May 20, 2010 - 11:56 pm

Hi Thessa,

you might want to read this:

Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576.

"The truth is that all variables in the analysis model must be included in the imputation model. The fear is that including the DV in the imputation model might lead to bias in estimating the important relationships (e.g., the regression coefficient of a program variable predicting the DV). However, the opposite actually happens. When the DV is included in the model, all relevant parameter estimates are unbiased, but excluding the DV from the imputation model for the IVs and covariates can be shown to produce biased estimates. The problem with leaving the DV out of the imputation model is this: When any variable is omitted from the model, imputation is carried out under the assumption that the correlation is r = 0 between the omitted variable and variables included in the imputation model. Thus, when the DV is omitted, the correlations between it and the IVs (and covariates) included in the model are all suppressed (i.e., biased) toward 0."

Thessa Wong posted on Friday, May 21, 2010 - 2:31 am

That makes a lot of sense. Thank you very much for your thorough response!

I have another question: is there any way to check the quality of the imputation? In the Mplus output, you get the averaged estimates from all the created data sets. I was wondering whether it is possible to get output for all the different data sets? This way I am able to see whether there is a big variance or not in the estimates. If so, I guess the average estimate is not that reliable. Of course I can do the analyses seperately for all the data sets, but there is probably a way that Mplus can do that for me.

Jon Heron posted on Friday, May 21, 2010 - 4:11 am

Good question!
One for Bengt I think

I've been calling Mplus from Stata and running models on individual datasets that way. It looks like you many be able to do this in R now too.

Linda K. Muthen posted on Friday, May 21, 2010 - 8:39 am

The way to know how good the average parameter estimates are is to look at the standard errors of the parameter estimates. With imputation, these are computed using the average of the squared standard errors over the set of analyses and the between analysis parameter estimate variation (Rubin, 1987; Schafer, 1997).

Alvin Wee posted on Wednesday, June 26, 2013 - 10:46 pm

Hi Linda,

I am using Mplus 6 and did some analyses with multiple imputed data. I understand that I am about to get Mplus to produce between-imputation variances which if my understanding is correct, will appear as a "% missing" column next to the parameters in the output. However, no matter wgat I tried, this does not show up. The command that I use is as follow:

TITLE:
Base Model
Data:
File = C:\Users\Alvin\Desktop\MI\Mplus.50implist.dat;
TYPE = IMPUTATION ;
VARIABLE:
NAMES ARE TP1DEP TP2DEP TP3DEP TP1AN TP2AN TP3AN TP1STR TP2STR TP3STR TP1FDep TP2FDep
TP3FDep TP1Sleep TP3Sleep TP1SupT ;
USEVARIABLES ARE TP1DEP - TP1SupT ;
ANALYSIS:
TYPE = general ;
MODEL:
TP2DEP TP2AN TP2STR ON TP1DEP TP1AN TP1STR ;
TP3DEP TP3AN TP3STR ON TP2DEP TP2AN TP2STR ;
TP2DEP WITH TP2AN TP2STR ;
TP2AN WITH TP2STR ;
TP3DEP WITH TP3AN TP3STR ;
TP3AN WITH TP3STR ;
TP2DEP TP2AN TP2STR ON TP1FDep TP1Sleep TP1SupT TP2FDep ;
TP3DEP TP3AN TP3STR ON TP3FDep TP3Sleep ;
OUTPUT:
STDYX

Many thanks,
Alvin

Alvin Wee posted on Thursday, June 27, 2013 - 6:05 am

Got it. I upgraded to version 7.

Declan French posted on Monday, September 05, 2016 - 4:23 am

I have data 1993-2015 from a rotating panel in wide format. I am only interested in estimating on respondents in for 3 consecutive waves. So I've set up estimation with a series of 3 wave equations with parameters constrained across all groups of 3 waves. The problem is there are very few respondents in for most waves. If I try to estimate the model my number of observations becomes very small. MPlus cannot impute the missing data as there are too many missings. Is there any way I can get MPlus to estimate each 3-wave model on the available data for those years and constrain estimates across all these models?
Instead, I have rearranged the data in long format and constructed Yt Yt-1 Yt-2 and X's similarly. Is there any reason why stacking the data long-wise and estimating the model is not correct ?

Here's my 'wide' model :
!-------2015---------------

Y15 ON P14 (h1)
S13 (h2)
Y14 (h3)
X1_15(h4)

;

S15 ON S14 (s1) ;

P15 ON P14 (p1)
S14 (p2)
X1_15 (p3)

;

!-------14---------------

Y14 ON P13 (h1)
S12 (h2)
Y13 (h3)
X1_14(h4)

S14 ON S13 (s1) ;

P14 ON P13 (p1)
S13 (p2)
X1_14 (p3)

;

and so on until 1995.

Bengt O. Muthen posted on Monday, September 05, 2016 - 3:56 pm

Sounds like you could approach it as in UG ex 6.18, that is, as a multiple group, multiple cohort analysis. You have created the cohorts as these triplets.

Declan French posted on Tuesday, September 06, 2016 - 8:17 am

Thank you for your advice. Grouping sounds like the way to go. I've organised data long-wise and estimated by grouping (hopefully this is what you meant). This is causing probs but I think I can sort out. Is there any way of incorporating random individual effects into this set-up ? ATM I'm ignoring that the same people are sometimes in more than one 'cohort'.

My new code :

Grouping = year (2015=2015 2014=2014 etc. );

Model :

Y ON lagP (h1)
lag2S (h2)
lagY (h3)
X1 (h4)

;

lagS ON lag2S (s1) ;

lagP ON lag2P (p1)
lag2S (p2)
lagX1 (p3)

;

Model 2014 :

Y ON lagP (h1)
lag2S (h2)
lagY (h3)
X1 (h4)

;

lagS ON lag2S (s1) ;

lagP ON lag2P (p1)
lag2S (p2)
lagX1 (p3)

;

and so on to 1995

Bengt O. Muthen posted on Tuesday, September 06, 2016 - 3:29 pm

ex 6.18 uses the single-level, wide approach. Not the two-level, long approach.

Declan French posted on Wednesday, September 07, 2016 - 12:46 am

Thank you for your reply. However, I still do not see how I can take into consideration the same people are sometimes in more than one 'cohort'. Sorry - new to MPlus ! The only way I can see to do it is by basing 'cohort' on missing pattern - there are 100's of these.

My 'wide' code :

Grouping = year (2015=2015 2014=2014 etc. );

Model :

Y15 ON P14 (h1)
S13 (h2)
Y14 (h3)
X15 (h4)

;

S14 ON S13 (s1) ;

P14 ON P13 (p1)
S13 (p2)
X1_14 (p3)

;

Model 2014 :

Y14 ON P13 (h1)
S12 (h2)
Y13 (h3)
X1_14 (h4)

;

S13 ON S12 (s1) ;

P13 ON P12 (p1)
S12 (p2)
X1_13 (p3)

;

and so on to 1995

Bengt O. Muthen posted on Thursday, September 08, 2016 - 1:28 pm

Don't know what to say without knowing more - which we can't get into - but perhaps you can create cohorts (and multiple groups) corresponding to the 13 different waves.