Anonymous posted on Thursday, September 30, 2004 - 2:39 am
What should be done in an EFA or CFA in the Analysis Command: Type = missing... or not? The loadings are not that much different. Is it not more valid to take only complete observations, so the better thing would be the way to use not Type=missing?
It is probably best to use TYPE = MISSING because you have more power.
Anonymous posted on Monday, May 16, 2005 - 11:35 am
I am doing CFA and CFA with covariates (MIMIC) with continuous factor indicators. The data has missing values, and the sentence "ANALYSIS: TYPE=MISSING" seems to be proper. However, if "H1" is not included in this statement, the Chi-square test of model fit cannot be gotten. There is a sentences in the USER'S MANUAL, which reads, "H1 allows the estimation of an unstricted mean and covariance model with TYPE=MISSING. " Would you mind to explain it in more detail, and tell me should or should not the "H1" option be included? Thanks a lot.
The unrestricted mean and covariance model is needed to compute chi-square. We have its computation as an option because it can take a long time to estimate. If you want chi-square, then you should inlcude H1.
Anonymous posted on Monday, May 16, 2005 - 12:29 pm
Thank you very much for your quick reply, I really appreciate it. I have another question: What is the method that M-plus is using to deal with missing values in CFA/MIMIC? Is it "Full Information Maximum Likelihood?" And if it is, is it the same thing as EM algorithm with different name?
I am modeling 14 categorical indicators and wanted to account for the random missing values in my dataset. I also wanted to use WLSMV as my estimator because that is preferred for categorical data modeling.
I understand and am getting a successful output in terms of the code, but statistically speaking, what does the MAR ML estimation mean for me? What is it estimating, the values for each of those missing data points?
Appreciate any help you can give me, Thanks
bmuthen posted on Wednesday, August 31, 2005 - 9:04 am
ML under MAR does not estimate the values of the missing data points, but it uses all available data to estimate the model parameters. In contrast, WLSMV (assuming a model with no covariates, only factor indicators) considers all available data for each pair of variables when estimating the sample statistics to which the model is fit. So for example, if a person has data on variable 3 but not on variables 1 and 2, this person's data will not be used when estimating the sample statistics for variables 1 and 2. This is a loss of information since variable 3 is correlated with variables 1 and 2. And, if the missingness on variable 3 is not random, it can cause a certain amount of bias in the estimation for variables 1 and 2 since the sample used for variables 1 and 2 is then selective. So this is the disadvantage of using an estimator based on limited information, pair-wise information. ML under MAR is better, but is heavy when there are many factors.
David Bard posted on Thursday, February 15, 2007 - 5:19 pm
1. Type=Missing in an EFA always leads to pairwise deletion? 2. ML EFA with type=missing does not model complete data vectors, per se, but rather uses the pairwise-constructed covariance matrix in estimation? 3. ML EFA with type=missing should not be used in situations of MAR missingness? 4. The answers to questions above are true for both continuous and categorical outcomes?
TYPE=MISSING EFA; with maximum likelihood for continuous outcomes provides maximum likelihood estimation under MCAR and MAR. It does not do pairwise deletion. ML is not available for EFA with categorical outcomes. TYPE=MISSING EFA; with weighted least squares estimation uses pairwise present.
My understanding is that pairwise deletion has many limitations and is not well regarded as an option for dealing with missing data. Most notably, it is said that it produces biased SEs and test statistics in conventional software because of problems producing accurate sample size calculations. I also read that it is inappropriate for data missing at random.
Can you please explain why this is the approach used with WLSMV? Is Mplus able to address the limitations faced by conventional software? I am trying to justify the use of pairwise deletion in a categorical EFA that I have conducted in Mplus, and also try to capture any limitations this approach may produce. Any references that you could provide me would be very appreciated.
The pairwise present approach is not optimal under the MAR assumption, only under MCAR (see the Little & Rubin book). WLSMV will have problems with point estimates and everything else if the missingness is only MAR, not MCAR. If the missingness is MCAR (or mild deviations of that) there will be no problems. For a study of this wrt SEM, see
Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 42, 431-462.
at my UCLA web site. If data are not even MAR, the ML-MAR approach does not always win over pairwise present or listwise present.
In Mplus there is not a problem of accurate sample size calculations, but this MAR-MCAR distinction is the key issue.
It doesn't seem possible to generalize the weighted least-squares approach to cover MAR simply because of its simplicity of using bivariate information. If you try to cover MAR, you lose the simplicity.
Note that you can also do categorical EFA now in V5 with the ML estimator even with MAR missingness (but the method is computationally intensive for more than 3 factors) - see chapter 4 of the UG.
My data are very likely MNAR and listwise deletion results in a loss of 24% of the data. I have a large sample, 16,707 cases, and 302 missing data patterns. I am doing a categorical EFA with 17 binary variables as part of my Masters thesis. I do not plan to model the missingness simply because it is beyond the scope of a Masters level work.
Thus my question is: Given that the MAR asssumption is likely violated, is there any benefit to me using ML over WLSMV? Should I perhaps do a sensitivity analysis to compare the results from the 2 estimators?
Thank you. I really appreciate all of your assistance.
That is how weighted least squares becomes a simpler/faster estimator than ML. It uses "limited-information" from pairs of variables and therefore use all individuals with observations on that pair. If it considered other variables not included in the pair (which ML does) it would do better information- and missing-data-wise, but then it would no longer be a simple estimator.
Tracy Witte posted on Thursday, April 24, 2008 - 8:22 am
What if you are using ULS as an estimator (with categorical indicators)? Is the missing data handled in the same way as WLSMV?
Tracy Witte posted on Wednesday, November 12, 2008 - 11:21 am
I am doing an CFA with categorical indicators. There is some missing data, and I think that it is MAR rather than MCAR. From the above discussion, I gather that using the ML estimator would be better than WLSMV because it allows for data to be MAR. What about using the MLM estimator? Is that better than the ML for categorical data?
Another issue I've run into is that I don't get CFI, TLI, or RMSEA fit indices when I run my model using ML. Why is this? Is there a way to request these fit indices?
I am trying to understand how WLSMV would estimate standard latent variable models with missing data, such as factor analytic models and structural equation models, with only factor indicators. The Mplus manual states such models are estimated under missing data theory using all available data. Bengt's August 31, 2005 response above states the same but calls this "limited pairwise information." My questions are: are there any references for this approach as I feel somewhat blind when using it and mentioning it studies with missing data, and are there any references or studies that have compared WLSMV and ML in the context of missing data. I know the simulation literature recommends WLSMV for ordinal data, but is there a point of diminishing returns with WLSMV in the presence of missing data or a point where ML should be used?
If you don't have covariates in the model then the answer is simple. WLSMV works if the missing data is MCAR, while ML, Bayes and Multiple Imputations all support the more general MAR. Note that listwise deletion also supports MCAR but it uses only observations with full records, i.e., it doesn't use all the data. The difference between MAR and MCAR is that missing data patterns are independent of the observed data under MCAR, while they are not independent of the observed data under MAR. If there are covariates in the model the answer is more complicated.
If the amount of missing data is not large and the MCAR assumption is reasonable then WLSMV should work well.
Thank you for the detailed response. Yes, I would appreciate it you could send me that Technical Appendix.
Jon Elhai posted on Thursday, August 19, 2010 - 2:24 pm
Tihomir, I'm wondering if it's possible to briefly describe: 1) what is meant by "pairwise present" regarding how missing data are handled in WLSMV when there are no covariates,
2) in a nutshell how missing data are handled in this situation when there ARE covariates.
3) does the covariates situation in #2 above also apply if the covariates are not observed variables, but latent factors. In my case, I've got two measurement models, where a given CFA model's factors are allowed to correlate with the other CFA model's factors.
Pairwise present means that all available observations are used to estimate each correlation, that is, the sample size can vary for each correlation.
A technical appendix will be posted in about a week that describes missing data estimation with weighted least squares analysis.
ywang posted on Wednesday, February 09, 2011 - 12:37 pm
Is missing value the largest or the smallest value in Mplus? I am asking since I always generate new variables using "define". For example, for the following input, if whzw2 is missing, is overweight 1, 0 or missing? if whzw2>1.0365 then overweight=1 ; if whzw2<=1.0365 then overweight=0 ;
The value that is counted as missing is the value declared as a missing value flag using the MISSING option of the VARIABLE command. If this option is not used, no value is considered missing.
Rohini Sen posted on Tuesday, September 13, 2011 - 10:37 am
I was wondering how long does an EFA (accounting for missing values) take in Mplus - 48 items, n=500? I set it to run about half hour ago and it's still running. I specified Type=Missing EFA as well...
It depends. Categorical variables with ML takes longer. With continuous variables, ML takes longer than ULS. More factors take longer. To inform you about specifications and timing, run just one factor as a first step.
Rohini Sen posted on Tuesday, September 13, 2011 - 11:54 am
Thank you!! It just finished running -but thank you for the prompt response -much appreciated.
I am estimating an ordered-categorical CFA model with 8 factors each estimated by 6 observed variables. Each factor represents a repeated measure of the same variable across 8 waves (which I intend to develop into a multiple indicator linear growth model such as that in example 6.15 of the user guide). I have an unbalanced dataset as not all cases responded to every wave. I estimate the model using type=complex because it is survey data and so use a MLR estimator. In the resulting factor scores (obtained from save=fscores), a predicted score is given for all 8 waves for every case, creating a balanced dataset. I expected to see missing data where no observations were available to estimate the factor scores. My question is, am I specifying the model incorrectly if this is happening? And if this is meant to happen, is it ok to analyse these factor scores after removing those scores which are based on zero observations at a given wave?