Jon Elhai posted on Tuesday, April 27, 2010 - 6:27 pm
A couple questions about Mplus 6's ability to generate imputed datasets using MI... 1) What types of variables can be used in Mplus 6 for generating imputed datasets? Continuous and categorical variables? What about count variables? 2) Is the Bayes estimator robust to non-normality? Or must I normalize variables before generating imputed datasets in Mplus 6?
1. For now imputation is available for continuous and categorical variables. 2. No. You should not normalize the variables. Instead you can use two-part modeling, mixture modeling, or treat them as categorical if they have no more than 10 values.
I have tried to do a multiple imputation in Mplus 6 in which I impute 2 categorical variables. Everything seems to go well, but when I want to do further analyses with the imputed data, I get the following message for each data file:
"Errors for replication with data file P:\Data\Radar\Eigen\SPSS\zdaadn23imp1.dat:
*** ERROR Unexpected end of file reached in data file.
Errors for replication with data file P:\Data\Radar\Eigen\SPSS\zdaadn23imp2.dat:
*** ERROR Unexpected end of file reached in data file.
I just figured out what the problem was. I had defined to many variables in the command. However, that gives me a new problem: I want to use the imputations for analyses in which I also take into account other variables. However, I do not want these other variables to influence my data imputation, but it looks like that including these variables in the data imputation is the only way to get the variables in the new dataset with imputed data. In other words, how do I create a dataset with the imputed variables combined with other variables?
Jon Heron posted on Thursday, May 20, 2010 - 11:56 pm
you might want to read this:
Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576.
"The truth is that all variables in the analysis model must be included in the imputation model. The fear is that including the DV in the imputation model might lead to bias in estimating the important relationships (e.g., the regression coefficient of a program variable predicting the DV). However, the opposite actually happens. When the DV is included in the model, all relevant parameter estimates are unbiased, but excluding the DV from the imputation model for the IVs and covariates can be shown to produce biased estimates. The problem with leaving the DV out of the imputation model is this: When any variable is omitted from the model, imputation is carried out under the assumption that the correlation is r = 0 between the omitted variable and variables included in the imputation model. Thus, when the DV is omitted, the correlations between it and the IVs (and covariates) included in the model are all suppressed (i.e., biased) toward 0."
That makes a lot of sense. Thank you very much for your thorough response!
I have another question: is there any way to check the quality of the imputation? In the Mplus output, you get the averaged estimates from all the created data sets. I was wondering whether it is possible to get output for all the different data sets? This way I am able to see whether there is a big variance or not in the estimates. If so, I guess the average estimate is not that reliable. Of course I can do the analyses seperately for all the data sets, but there is probably a way that Mplus can do that for me.
Jon Heron posted on Friday, May 21, 2010 - 4:11 am
Good question! One for Bengt I think
I've been calling Mplus from Stata and running models on individual datasets that way. It looks like you many be able to do this in R now too.
The way to know how good the average parameter estimates are is to look at the standard errors of the parameter estimates. With imputation, these are computed using the average of the squared standard errors over the set of analyses and the between analysis parameter estimate variation (Rubin, 1987; Schafer, 1997).
Alvin Wee posted on Wednesday, June 26, 2013 - 10:46 pm
I am using Mplus 6 and did some analyses with multiple imputed data. I understand that I am about to get Mplus to produce between-imputation variances which if my understanding is correct, will appear as a "% missing" column next to the parameters in the output. However, no matter wgat I tried, this does not show up. The command that I use is as follow:
TITLE: Base Model Data: File = C:\Users\Alvin\Desktop\MI\Mplus.50implist.dat; TYPE = IMPUTATION ; VARIABLE: NAMES ARE TP1DEP TP2DEP TP3DEP TP1AN TP2AN TP3AN TP1STR TP2STR TP3STR TP1FDep TP2FDep TP3FDep TP1Sleep TP3Sleep TP1SupT ; USEVARIABLES ARE TP1DEP - TP1SupT ; ANALYSIS: TYPE = general ; MODEL: TP2DEP TP2AN TP2STR ON TP1DEP TP1AN TP1STR ; TP3DEP TP3AN TP3STR ON TP2DEP TP2AN TP2STR ; TP2DEP WITH TP2AN TP2STR ; TP2AN WITH TP2STR ; TP3DEP WITH TP3AN TP3STR ; TP3AN WITH TP3STR ; TP2DEP TP2AN TP2STR ON TP1FDep TP1Sleep TP1SupT TP2FDep ; TP3DEP TP3AN TP3STR ON TP3FDep TP3Sleep ; OUTPUT: STDYX
Many thanks, Alvin
Alvin Wee posted on Thursday, June 27, 2013 - 6:05 am
I have data 1993-2015 from a rotating panel in wide format. I am only interested in estimating on respondents in for 3 consecutive waves. So I've set up estimation with a series of 3 wave equations with parameters constrained across all groups of 3 waves. The problem is there are very few respondents in for most waves. If I try to estimate the model my number of observations becomes very small. MPlus cannot impute the missing data as there are too many missings. Is there any way I can get MPlus to estimate each 3-wave model on the available data for those years and constrain estimates across all these models? Instead, I have rearranged the data in long format and constructed Yt Yt-1 Yt-2 and X's similarly. Is there any reason why stacking the data long-wise and estimating the model is not correct ?
Here's my 'wide' model : !-------2015---------------
Thank you for your advice. Grouping sounds like the way to go. I've organised data long-wise and estimated by grouping (hopefully this is what you meant). This is causing probs but I think I can sort out. Is there any way of incorporating random individual effects into this set-up ? ATM I'm ignoring that the same people are sometimes in more than one 'cohort'.
Thank you for your reply. However, I still do not see how I can take into consideration the same people are sometimes in more than one 'cohort'. Sorry - new to MPlus ! The only way I can see to do it is by basing 'cohort' on missing pattern - there are 100's of these.