Calvin Croy posted on Wednesday, October 05, 2005 - 12:19 pm
Could you please answer the following 3 questions?
I am running Mplus version 3.13 for a CFA analysis and specifying Type = Missing H1.
1. After reading the User's Guide and examining posts on this discussion site, it appears to me that Mplus does NOT actually impute ("fill in") missing data values when Type = Missing H1 is specified. Is this correct?
2. Based on what I've read, it sounds like Mplus uses the EM algorithm to estimate means, variances, and covariances (the sufficient statistics) using all the factor indicator variables. Then the estimated means, variances, and covariances resulting from EM are used to derive the model parameter estimates. Is this right? If not, could could you please clarify?
3. If one specifies Type = Imputation to read multiple datasets created by some multiple imputation process outside of Mplus (Example 12.13), how is EM used, if at all?
Thanks so much for answering posted questions! While I'm sure this is quite time consuming, your comments are valued greatly and keep us users on the right path.
bmuthen posted on Saturday, October 08, 2005 - 2:15 pm
1. That's right. Standard ML estimation is used instead. The missing values can however be produced.
2. Not quite. EM is used in the ML estimation of the unrestricted mean vector and covariance matrix (the H1 model) - but these results are only used to be able to compute a chi-square test of the H0 model against H1. For the H0 parameter estimation, all available raw data are used in the computations using ML. EM is not used in the H0 computations.
3. EM is not used here. Type = Imputation is intended for use when another program has been first used to impute missing data (e.g. freeware such as NORM). These multiple data sets are then sent to Mplus and analyzed by Mplus, followed by an Mplus summary of the parameter estimates and the computation of the SEs.
Just to add to this - one can use EM to generate imputed data. In the EM algorithm the missing data are computed for each person. This type of output is however not currently available in Mplus.
I'm using MPlus 6, and I'd like to understand how missing data on the outcomes are treated in latent growth models when using ML/MLR estimators.
1) I think I've read that missing data are considered MAR. Does it mean that, in a longitudinal context, missingness for one indicator in time can be conditional on the values of the same indicator at other time points?
2) Is there a technical article where the algorithm used in MPlus for MAR treatment of missing data within a ML estimation is explained? Or a general reference?
The results should be identical if you set the models up correctly. You may be missing the fact that the residual variances need to be held equal over time for the Mplus model to be the same as the multilevel model. Both assume MAR.
Ok, your answer made me realize there was a glitch in my data transformation routine. With the correct datasets, the estimates are now completely identical, using the latent growth or the multilevel specification.
There is still something I don't get. I'm fitting the same model with a latent growth curve or a multilevel specification, and I get exactly the same results.
I don't understand how it is possible, knowing that some of the time-varying covariates have been imputed, regardless if the individual was still followed or not.
In the multilevel spec, with a long data format, it has no consequences, because the lines where the individual was not in the study are deleted. But in the wide format used by the latent growth specification, these imputations are still in the data, so I thought they should be impacting the estimation... which is not the case.
I hope I made myself clear. Do you have an explanation for this behaviour? I could send you my models if necessary.
In Mplus the model is estimated conditioned on the covariates. Cases with missing data on observed exogenous variables are deleted from the analysis as they are in HLM. With dependent variables, values are not imputed. All available information is used as in HLM.
Soz posted on Wednesday, August 10, 2011 - 7:07 am
Dear Linda and Bengt,
I am running a SEM with 5 imputed data sets for missing values within PASW. I also created the implist.dat where the five datasets are identified. I used the following specification:
FILE IS Implist.dat;
VARIABLE: NAMES ARE Imputation_ VPCode country RiskP_M3 OE_M6 Inten_M5 SE_M6 Plan_M2 Plan_M6 CHBdiet9 ;
USEVARIABLES ARE RiskP_M3 OE_M6 Inten_M5 SE_M6 Plan_M2 Plan_M6 CHBdiet9;
ANALYSIS: TYPE is Imputation; ESTIMATOR IS ML; ITERATIONS = 10000; CONVERGENCE = 0.00005;
Model: Inten_M5 on RiskP_M3 OE_M6 SE_M6 CHBdiet9 ; Plan_M2 on Inten_M5 ; Plan_M6 on Plan_M2;
OUTPUT: tech1; standardized;
Nevertheless i`ve got this error message:
*** ERROR in ANALYSIS command Unrecognized setting for TYPE option: IMPUTATION
What did I wrong? How can I fix the problem? Many Thanks in advance.
"A basic identifiability requirement for the imputa- tion model is that for (A) each variable in the imputation the number of observations should be at least as many as the number of (B) variables in the imputation model."
I don't understand the difference between the variables I marked with (A) and (B). Does (A) refer to the variables in the usevariables list, and (B) to the variables in the "impute" list?
So in theory, 793 values out of 800 could be missing? That doesn't sound like something I could report in a paper ;)
We have a large number of missing values on one crucial covariate, which MPLUS cannot handle in a multilevel analysis if we just add it as x1; into the model (non-convergence).
And since MPLUS does listwise deletion for missing time-varying covariates in Multilevel Models (even if only 1 measurement point misses - why is that?), we were thinking to impute. Imputation converges, but we are not sure if imputation with 40% missing is feasible.
Dear Dr. Muthen, I am using MPlus Vers6 & survey data of three waves in a panel design, having solely the dependent variable at time point 3. My sample population is 207 at time point 1 (T1), then reduced to 176 individuals at time point 2 (T2) and finally reduced to 137 at time point 3 (T3). The calculations, however, are performed with N=176. Thus, I assume the missing data at T3 is imputed, right?
Is there a critical proportion of missing data where imputation is problematic? Do you know any paper that discusses this topic?
See our Multiple Imputation examples in the Version 8 User's Guide on our website, Chapter 11, starting with ex11.5.
Pia Kreijkes posted on Wednesday, January 16, 2019 - 2:21 pm
Thank you for this. Is the Bayesian approach equivalent to expectation maximisation for imputing missing values? So I could simply impute a single data set instead of multiple and would get imputed values similar to those that single imputation using the EM algorithm would give me?
If you want the latter, the former is not necessary. ML under MAR ("FIML") or Bayes can handle the latter. EM is an ML algorithm than can be used for FIML. One model is the covariance matrix - EM is often used to estimate that with missing data. But if you want to use the covariance matrix to estimate parameters for a model describing the covariance matrix - such as a factor model - you don't need to first estimate the covariance matrix,
Pia Kreijkes posted on Thursday, January 17, 2019 - 12:40 am
I would like to get values for the missing data. I considered many different methods for handling missing data such as FIML but concluded that it will be best for me to first fill in my missing values and then start my analyses from there. Multiple Imputation would not work as I have analyses along the way that won't be able to deal with multiple data sets and combining results of many single data sets would be a big hassle. I have only about 2 percent of responses missing, so I feel a single imputation method would not be a big problem. So imputing a single data set using BAYES in the MI framework of MPLUS would be OK only that one would be concerned with the underestimation of error I assume? That should be similar to single imputation with EM?
I need to use WLSMV as an estimator in my models which is why FIML does not work for me. Hence, I'm looking for a way to impute a single data set im MPLUS, preferably using EM or a valid alternative. Can I do that with the BAYES estimator, setting the number of imputed data sets to 1? Thanks
Bayes is great when ML needs too heavy numerical integration (I assume this is why you used WLSMV). You can do 1 imputation using Bayes. I would, however, instead recommend handling the missing data by just using Bayes directly to estimate your model instead of first imputing data. Bayes is "FIML-like" in that it also uses all available data and gives correct estimates under MAR.
I used WLSMV because I have ordinal data which is quite skewed and when I used MLR and specifying data as categorical for my models, no fit indices were provided. Thank you very much for your help and recommendations!
In the Two-level Regression, it is shown that, there is a usage of the Expectation-Maximization algorithm & Steepest Descent iterations. It is understood that the "EM algorithm" can perform both "Ascent related iterations" & "Descent related iterations" to obtain the Local Optima. Even though the "EM algorithm", perform the "Descent related iterations", what is the reason to incorporate the "Steepest descent iterations" along with the EM algorithm?
In the Two-level Regression, the "Expectation-Maximization Algorithm" is used to analyze the "H1 Model", whereas the "MLR" is used to analyze the "H0 Model". Can't we use the "EM algorithm", to analyze both the "H1 Model" & "H0 Model"?