Could you please answer the following 3 questions?
I am running Mplus version 3.13 for a CFA analysis and specifying Type = Missing H1.
1. After reading the User's Guide and examining posts on this discussion site, it appears to me that Mplus does NOT actually impute ("fill in") missing data values when Type = Missing H1 is specified. Is this correct?
2. Based on what I've read, it sounds like Mplus uses the EM algorithm to estimate means, variances, and covariances (the sufficient statistics) using all the factor indicator variables. Then the estimated means, variances, and covariances resulting from EM are used to derive the model parameter estimates. Is this right? If not, could could you please clarify?
3. If one specifies Type = Imputation to read multiple datasets created by some multiple imputation process outside of Mplus (Example 12.13), how is EM used, if at all?
Thanks so much for answering posted questions! While I'm sure this is quite time consuming, your comments are valued greatly and keep us users on the right path.
bmuthen posted on Saturday, October 08, 2005 - 8:15 pm
1. That's right. Standard ML estimation is used instead. The missing values can however be produced.
2. Not quite. EM is used in the ML estimation of the unrestricted mean vector and covariance matrix (the H1 model) - but these results are only used to be able to compute a chi-square test of the H0 model against H1. For the H0 parameter estimation, all available raw data are used in the computations using ML. EM is not used in the H0 computations.
3. EM is not used here. Type = Imputation is intended for use when another program has been first used to impute missing data (e.g. freeware such as NORM). These multiple data sets are then sent to Mplus and analyzed by Mplus, followed by an Mplus summary of the parameter estimates and the computation of the SEs.
Just to add to this - one can use EM to generate imputed data. In the EM algorithm the missing data are computed for each person. This type of output is however not currently available in Mplus.
I'm using MPlus 6, and I'd like to understand how missing data on the outcomes are treated in latent growth models when using ML/MLR estimators.
1) I think I've read that missing data are considered MAR. Does it mean that, in a longitudinal context, missingness for one indicator in time can be conditional on the values of the same indicator at other time points?
2) Is there a technical article where the algorithm used in MPlus for MAR treatment of missing data within a ML estimation is explained? Or a general reference?
The results should be identical if you set the models up correctly. You may be missing the fact that the residual variances need to be held equal over time for the Mplus model to be the same as the multilevel model. Both assume MAR.
Ok, your answer made me realize there was a glitch in my data transformation routine. With the correct datasets, the estimates are now completely identical, using the latent growth or the multilevel specification.
There is still something I don't get. I'm fitting the same model with a latent growth curve or a multilevel specification, and I get exactly the same results.
I don't understand how it is possible, knowing that some of the time-varying covariates have been imputed, regardless if the individual was still followed or not.
In the multilevel spec, with a long data format, it has no consequences, because the lines where the individual was not in the study are deleted. But in the wide format used by the latent growth specification, these imputations are still in the data, so I thought they should be impacting the estimation... which is not the case.
I hope I made myself clear. Do you have an explanation for this behaviour? I could send you my models if necessary.
In Mplus the model is estimated conditioned on the covariates. Cases with missing data on observed exogenous variables are deleted from the analysis as they are in HLM. With dependent variables, values are not imputed. All available information is used as in HLM.
Soz posted on Wednesday, August 10, 2011 - 1:07 pm
Dear Linda and Bengt,
I am running a SEM with 5 imputed data sets for missing values within PASW. I also created the implist.dat where the five datasets are identified. I used the following specification:
FILE IS Implist.dat;
VARIABLE: NAMES ARE Imputation_ VPCode country RiskP_M3 OE_M6 Inten_M5 SE_M6 Plan_M2 Plan_M6 CHBdiet9 ;
USEVARIABLES ARE RiskP_M3 OE_M6 Inten_M5 SE_M6 Plan_M2 Plan_M6 CHBdiet9;
ANALYSIS: TYPE is Imputation; ESTIMATOR IS ML; ITERATIONS = 10000; CONVERGENCE = 0.00005;
Model: Inten_M5 on RiskP_M3 OE_M6 SE_M6 CHBdiet9 ; Plan_M2 on Inten_M5 ; Plan_M6 on Plan_M2;
OUTPUT: tech1; standardized;
Nevertheless i`ve got this error message:
*** ERROR in ANALYSIS command Unrecognized setting for TYPE option: IMPUTATION
What did I wrong? How can I fix the problem? Many Thanks in advance.
"A basic identifiability requirement for the imputa- tion model is that for (A) each variable in the imputation the number of observations should be at least as many as the number of (B) variables in the imputation model."
I don't understand the difference between the variables I marked with (A) and (B). Does (A) refer to the variables in the usevariables list, and (B) to the variables in the "impute" list?
So in theory, 793 values out of 800 could be missing? That doesn't sound like something I could report in a paper ;)
We have a large number of missing values on one crucial covariate, which MPLUS cannot handle in a multilevel analysis if we just add it as x1; into the model (non-convergence).
And since MPLUS does listwise deletion for missing time-varying covariates in Multilevel Models (even if only 1 measurement point misses - why is that?), we were thinking to impute. Imputation converges, but we are not sure if imputation with 40% missing is feasible.
Dear Dr. Muthen, I am using MPlus Vers6 & survey data of three waves in a panel design, having solely the dependent variable at time point 3. My sample population is 207 at time point 1 (T1), then reduced to 176 individuals at time point 2 (T2) and finally reduced to 137 at time point 3 (T3). The calculations, however, are performed with N=176. Thus, I assume the missing data at T3 is imputed, right?
Is there a critical proportion of missing data where imputation is problematic? Do you know any paper that discusses this topic?