Message/Author 

Jon Elhai posted on Tuesday, April 27, 2010  6:27 pm



A couple questions about Mplus 6's ability to generate imputed datasets using MI... 1) What types of variables can be used in Mplus 6 for generating imputed datasets? Continuous and categorical variables? What about count variables? 2) Is the Bayes estimator robust to nonnormality? Or must I normalize variables before generating imputed datasets in Mplus 6? 


1. For now imputation is available for continuous and categorical variables. 2. No. You should not normalize the variables. Instead you can use twopart modeling, mixture modeling, or treat them as categorical if they have no more than 10 values. 


I have tried to do a multiple imputation in Mplus 6 in which I impute 2 categorical variables. Everything seems to go well, but when I want to do further analyses with the imputed data, I get the following message for each data file: "Errors for replication with data file P:\Data\Radar\Eigen\SPSS\zdaadn23imp1.dat: *** ERROR Unexpected end of file reached in data file. Errors for replication with data file P:\Data\Radar\Eigen\SPSS\zdaadn23imp2.dat: *** ERROR Unexpected end of file reached in data file. etc.." What am I doing wrong? Thank you very much in advance! 


I just figured out what the problem was. I had defined to many variables in the command. However, that gives me a new problem: I want to use the imputations for analyses in which I also take into account other variables. However, I do not want these other variables to influence my data imputation, but it looks like that including these variables in the data imputation is the only way to get the variables in the new dataset with imputed data. In other words, how do I create a dataset with the imputed variables combined with other variables? 


See the AUXILIARY option in the user's guide. 

Jon Heron posted on Thursday, May 20, 2010  11:56 pm



Hi Thessa, you might want to read this: Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549576. "The truth is that all variables in the analysis model must be included in the imputation model. The fear is that including the DV in the imputation model might lead to bias in estimating the important relationships (e.g., the regression coefficient of a program variable predicting the DV). However, the opposite actually happens. When the DV is included in the model, all relevant parameter estimates are unbiased, but excluding the DV from the imputation model for the IVs and covariates can be shown to produce biased estimates. The problem with leaving the DV out of the imputation model is this: When any variable is omitted from the model, imputation is carried out under the assumption that the correlation is r = 0 between the omitted variable and variables included in the imputation model. Thus, when the DV is omitted, the correlations between it and the IVs (and covariates) included in the model are all suppressed (i.e., biased) toward 0." 


That makes a lot of sense. Thank you very much for your thorough response! I have another question: is there any way to check the quality of the imputation? In the Mplus output, you get the averaged estimates from all the created data sets. I was wondering whether it is possible to get output for all the different data sets? This way I am able to see whether there is a big variance or not in the estimates. If so, I guess the average estimate is not that reliable. Of course I can do the analyses seperately for all the data sets, but there is probably a way that Mplus can do that for me. 

Jon Heron posted on Friday, May 21, 2010  4:11 am



Good question! One for Bengt I think I've been calling Mplus from Stata and running models on individual datasets that way. It looks like you many be able to do this in R now too. 


The way to know how good the average parameter estimates are is to look at the standard errors of the parameter estimates. With imputation, these are computed using the average of the squared standard errors over the set of analyses and the between analysis parameter estimate variation (Rubin, 1987; Schafer, 1997). 

Alvin Wee posted on Wednesday, June 26, 2013  10:46 pm



Hi Linda, I am using Mplus 6 and did some analyses with multiple imputed data. I understand that I am about to get Mplus to produce betweenimputation variances which if my understanding is correct, will appear as a "% missing" column next to the parameters in the output. However, no matter wgat I tried, this does not show up. The command that I use is as follow: TITLE: Base Model Data: File = C:\Users\Alvin\Desktop\MI\Mplus.50implist.dat; TYPE = IMPUTATION ; VARIABLE: NAMES ARE TP1DEP TP2DEP TP3DEP TP1AN TP2AN TP3AN TP1STR TP2STR TP3STR TP1FDep TP2FDep TP3FDep TP1Sleep TP3Sleep TP1SupT ; USEVARIABLES ARE TP1DEP  TP1SupT ; ANALYSIS: TYPE = general ; MODEL: TP2DEP TP2AN TP2STR ON TP1DEP TP1AN TP1STR ; TP3DEP TP3AN TP3STR ON TP2DEP TP2AN TP2STR ; TP2DEP WITH TP2AN TP2STR ; TP2AN WITH TP2STR ; TP3DEP WITH TP3AN TP3STR ; TP3AN WITH TP3STR ; TP2DEP TP2AN TP2STR ON TP1FDep TP1Sleep TP1SupT TP2FDep ; TP3DEP TP3AN TP3STR ON TP3FDep TP3Sleep ; OUTPUT: STDYX Many thanks, Alvin 

Alvin Wee posted on Thursday, June 27, 2013  6:05 am



Got it. I upgraded to version 7. 


I have data 19932015 from a rotating panel in wide format. I am only interested in estimating on respondents in for 3 consecutive waves. So I've set up estimation with a series of 3 wave equations with parameters constrained across all groups of 3 waves. The problem is there are very few respondents in for most waves. If I try to estimate the model my number of observations becomes very small. MPlus cannot impute the missing data as there are too many missings. Is there any way I can get MPlus to estimate each 3wave model on the available data for those years and constrain estimates across all these models? Instead, I have rearranged the data in long format and constructed Yt Yt1 Yt2 and X's similarly. Is there any reason why stacking the data longwise and estimating the model is not correct ? Here's my 'wide' model : !2015 Y15 ON P14 (h1) S13 (h2) Y14 (h3) X1_15(h4) ; S15 ON S14 (s1) ; P15 ON P14 (p1) S14 (p2) X1_15 (p3) ; !14 Y14 ON P13 (h1) S12 (h2) Y13 (h3) X1_14(h4) S14 ON S13 (s1) ; P14 ON P13 (p1) S13 (p2) X1_14 (p3) ; and so on until 1995. 


Sounds like you could approach it as in UG ex 6.18, that is, as a multiple group, multiple cohort analysis. You have created the cohorts as these triplets. 


Thank you for your advice. Grouping sounds like the way to go. I've organised data longwise and estimated by grouping (hopefully this is what you meant). This is causing probs but I think I can sort out. Is there any way of incorporating random individual effects into this setup ? ATM I'm ignoring that the same people are sometimes in more than one 'cohort'. My new code : Grouping = year (2015=2015 2014=2014 etc. ); Model : Y ON lagP (h1) lag2S (h2) lagY (h3) X1 (h4) ; lagS ON lag2S (s1) ; lagP ON lag2P (p1) lag2S (p2) lagX1 (p3) ; Model 2014 : Y ON lagP (h1) lag2S (h2) lagY (h3) X1 (h4) ; lagS ON lag2S (s1) ; lagP ON lag2P (p1) lag2S (p2) lagX1 (p3) ; and so on to 1995 


ex 6.18 uses the singlelevel, wide approach. Not the twolevel, long approach. 


Thank you for your reply. However, I still do not see how I can take into consideration the same people are sometimes in more than one 'cohort'. Sorry  new to MPlus ! The only way I can see to do it is by basing 'cohort' on missing pattern  there are 100's of these. My 'wide' code : Grouping = year (2015=2015 2014=2014 etc. ); Model : Y15 ON P14 (h1) S13 (h2) Y14 (h3) X15 (h4) ; S14 ON S13 (s1) ; P14 ON P13 (p1) S13 (p2) X1_14 (p3) ; Model 2014 : Y14 ON P13 (h1) S12 (h2) Y13 (h3) X1_14 (h4) ; S13 ON S12 (s1) ; P13 ON P12 (p1) S12 (p2) X1_13 (p3) ; and so on to 1995 


Don't know what to say without knowing more  which we can't get into  but perhaps you can create cohorts (and multiple groups) corresponding to the 13 different waves. 

Back to top 