Mplus Discussion >> Missing in EFA/CFA

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Missing in EFA/CFA

Mplus Discussion > Missing Data Modeling >

Message/Author

Anonymous posted on Thursday, September 30, 2004 - 2:39 am

What should be done in an EFA or CFA in the Analysis Command: Type = missing... or not? The loadings are not that much different. Is it not more valid to take only complete observations, so the better thing would be the way to use not Type=missing?

Linda K. Muthen posted on Thursday, September 30, 2004 - 11:25 am

It is probably best to use TYPE = MISSING because you have more power.

Anonymous posted on Monday, May 16, 2005 - 11:35 am

I am doing CFA and CFA with covariates (MIMIC) with continuous factor indicators. The data has missing values, and the sentence "ANALYSIS: TYPE=MISSING" seems to be proper. However, if "H1" is not included in this statement, the Chi-square test of model fit cannot be gotten. There is a sentences in the USER'S MANUAL, which reads, "H1 allows the estimation of an unstricted mean and covariance model with TYPE=MISSING. " Would you mind to explain it in more detail, and tell me should or should not the "H1" option be included? Thanks a lot.

Linda K. Muthen posted on Monday, May 16, 2005 - 11:58 am

The unrestricted mean and covariance model is needed to compute chi-square. We have its computation as an option because it can take a long time to estimate. If you want chi-square, then you should inlcude H1.

Anonymous posted on Monday, May 16, 2005 - 12:29 pm

Thank you very much for your quick reply, I really appreciate it. I have another question: What is the method that M-plus is using to deal with missing values in CFA/MIMIC? Is it "Full Information Maximum Likelihood?" And if it is, is it the same thing as EM algorithm with different name?

Linda K. Muthen posted on Monday, May 16, 2005 - 1:28 pm

Full information maximum likelihood using the EM algorithm. EM is an algorithm not an estimator.

Patrick Brown posted on Wednesday, August 31, 2005 - 6:56 am

Good Morning,

I am modeling 14 categorical indicators and wanted to account for the random missing values in my dataset. I also wanted to use WLSMV as my estimator because that is preferred for categorical data modeling.

I understand and am getting a successful output in terms of the code, but statistically speaking, what does the MAR ML estimation mean for me? What is it estimating, the values for each of those missing data points?

Appreciate any help you can give me,
Thanks

bmuthen posted on Wednesday, August 31, 2005 - 9:04 am

ML under MAR does not estimate the values of the missing data points, but it uses all available data to estimate the model parameters. In contrast, WLSMV (assuming a model with no covariates, only factor indicators) considers all available data for each pair of variables when estimating the sample statistics to which the model is fit. So for example, if a person has data on variable 3 but not on variables 1 and 2, this person's data will not be used when estimating the sample statistics for variables 1 and 2. This is a loss of information since variable 3 is correlated with variables 1 and 2. And, if the missingness on variable 3 is not random, it can cause a certain amount of bias in the estimation for variables 1 and 2 since the sample used for variables 1 and 2 is then selective. So this is the disadvantage of using an estimator based on limited information, pair-wise information. ML under MAR is better, but is heavy when there are many factors.

David Bard posted on Thursday, February 15, 2007 - 5:19 pm

1. Type=Missing in an EFA always leads to pairwise deletion?
2. ML EFA with type=missing does not model complete data vectors, per se, but rather uses the pairwise-constructed covariance matrix in estimation?
3. ML EFA with type=missing should not be used in situations of MAR missingness?
4. The answers to questions above are true for both continuous and categorical outcomes?

tx, db

Linda K. Muthen posted on Thursday, February 15, 2007 - 5:50 pm

TYPE=MISSING EFA; with maximum likelihood for continuous outcomes provides maximum likelihood estimation under MCAR and MAR. It does not do pairwise deletion. ML is not available for EFA with categorical outcomes. TYPE=MISSING EFA; with weighted least squares estimation uses pairwise present.

Alison Riddle posted on Tuesday, February 05, 2008 - 10:05 am

Hi Linda,

My understanding is that pairwise deletion has many limitations and is not well regarded as an option for dealing with missing data. Most notably, it is said that it produces biased SEs and test statistics in conventional software because of problems producing accurate sample size calculations. I also read that it is inappropriate for data missing at random.

Can you please explain why this is the approach used with WLSMV? Is Mplus able to address the limitations faced by conventional software? I am trying to justify the use of pairwise deletion in a categorical EFA that I have conducted in Mplus, and also try to capture any limitations this approach may produce. Any references that you could provide me would be very appreciated.

Cheers,
Alison

Bengt O. Muthen posted on Tuesday, February 05, 2008 - 12:14 pm

The pairwise present approach is not optimal under the MAR assumption, only under MCAR (see the Little & Rubin book). WLSMV will have problems with point estimates and everything else if the missingness is only MAR, not MCAR. If the missingness is MCAR (or mild deviations of that) there will be no problems. For a study of this wrt SEM, see

Muth�n, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 42, 431-462.

at my UCLA web site. If data are not even MAR, the ML-MAR approach does not always win over pairwise present or listwise present.

In Mplus there is not a problem of accurate sample size calculations, but this MAR-MCAR distinction is the key issue.

It doesn't seem possible to generalize the weighted least-squares approach to cover MAR simply because of its simplicity of using bivariate information. If you try to cover MAR, you lose the simplicity.

Note that you can also do categorical EFA now in V5 with the ML estimator even with MAR missingness (but the method is computationally intensive for more than 3 factors) - see chapter 4 of the UG.

Alison Riddle posted on Tuesday, February 05, 2008 - 1:43 pm

Thank you, Bengt.

Just to clarify your first point... Are you saying that WLSMV will not produce point estimates if the missingness largely deviate from MCAR? I have run my EFA with WLSMV without any problems.

Also, is your point about ML-MAR not always winning over pairwise present or listwise present captured in your 1987 article?

Thanks again.

Bengt O. Muthen posted on Tuesday, February 05, 2008 - 1:55 pm

It will produce point estimates (and SEs), but they can be somewhat biased if MCAR doesn't hold.

Yes.

Alison Riddle posted on Tuesday, February 05, 2008 - 3:34 pm

Hello again, Bengt,

I'm hoping you will indulge me a bit longer...

My data are very likely MNAR and listwise deletion results in a loss of 24% of the data. I have a large sample, 16,707 cases, and 302 missing data patterns. I am doing a categorical EFA with 17 binary variables as part of my Masters thesis. I do not plan to model the missingness simply because it is beyond the scope of a Masters level work.

Thus my question is: Given that the MAR asssumption is likely violated, is there any benefit to me using ML over WLSMV? Should I perhaps do a sensitivity analysis to compare the results from the 2 estimators?

Thank you. I really appreciate all of your assistance.

Cheers,
Alison

Bengt O. Muthen posted on Tuesday, February 05, 2008 - 3:52 pm

If you can do ML, which only assumes MAR, you are likely to do better than a method that assumes MCAR when data are MAR or MNAR. But ML is only feasible if you have a small number of factors.

You can always try the sensitivity to MAR by representing missing data patterns (at least some key types) by dummy variables as in pattern-mixture modeling and do factor analysis with covariates.

Another approach is to add x variables to the model that are likely to predict missingness - then MAR is more likely to hold. And weighted least squares is MAR wrt x's.

Alison Riddle posted on Wednesday, February 06, 2008 - 9:43 am

Hi,

Just one more question of clarification... why does WLSMV use pairwise present analysis?

Thanks,
Alison

Bengt O. Muthen posted on Wednesday, February 06, 2008 - 6:30 pm

That is how weighted least squares becomes a simpler/faster estimator than ML. It uses "limited-information" from pairs of variables and therefore use all individuals with observations on that pair. If it considered other variables not included in the pair (which ML does) it would do better information- and missing-data-wise, but then it would no longer be a simple estimator.

Tracy Witte posted on Thursday, April 24, 2008 - 8:22 am

What if you are using ULS as an estimator (with categorical indicators)? Is the missing data handled in the same way as WLSMV?

Linda K. Muthen posted on Thursday, April 24, 2008 - 8:34 am

Yes.

Tracy Witte posted on Wednesday, November 12, 2008 - 11:21 am

I am doing an CFA with categorical indicators. There is some missing data, and I think that it is MAR rather than MCAR. From the above discussion, I gather that using the ML estimator would be better than WLSMV because it allows for data to be MAR. What about using the MLM estimator? Is that better than the ML for categorical data?

Another issue I've run into is that I don't get CFI, TLI, or RMSEA fit indices when I run my model using ML. Why is this? Is there a way to request these fit indices?

Thank you!

Linda K. Muthen posted on Wednesday, November 12, 2008 - 1:44 pm

Yes, using maximum likelihood would be bet ter. MLM is not available for categorical outcomes. I would use MLR.

When means, variances, and covariances are not sufficient statistics for model estimation, chi-square and related fit statistics are not available.

Maren Winkler posted on Wednesday, August 26, 2009 - 3:53 am

Dear Linda and Bengt,

in their paper "Missing Data: Our View of the State of the Art", Schafer and Graham (2002) point out that Mplus has an EM algorithm for ML estimation. Is this the default Mplus uses when I specify:

"ANALYSIS:
TYPE = MISSING;
ESTIMATOR=ML;

....

MISSING = BLANK;"

I have data for 372 subjects, data is complete for 6 variables and has missing on one variable (10 %). I assume I have MAR.

Thanks for your help!

Linda K. Muthen posted on Wednesday, August 26, 2009 - 12:11 pm

Yes, this is the default in that situation.

Thomas A. Schmitt posted on Thursday, August 12, 2010 - 10:58 am

Hello,

I am trying to understand how WLSMV would estimate standard latent variable models with missing data, such as factor analytic models and structural equation models, with only factor indicators. The Mplus manual states such models are estimated under missing data theory using all available data. Bengt's August 31, 2005 response above states the same but calls this "limited pairwise information." My questions are: are there any references for this approach as I feel somewhat blind when using it and mentioning it studies with missing data, and are there any references or studies that have compared WLSMV and ML in the context of missing data. I know the simulation literature recommends WLSMV for ordinal data, but is there a point of diminishing returns with WLSMV in the presence of missing data or a point where ML should be used?

Best,

Tom

Tihomir Asparouhov posted on Thursday, August 12, 2010 - 1:47 pm

If you don't have covariates in the model then the answer is simple. WLSMV works if the missing data is MCAR, while ML, Bayes and Multiple Imputations all support the more general MAR. Note that listwise deletion also supports MCAR but it uses only observations with full records, i.e., it doesn't use all the data. The difference between MAR and MCAR is that missing data patterns are independent of the observed data under MCAR, while they are not independent of the observed data under MAR. If there are covariates in the model the answer is more complicated.

If the amount of missing data is not large and the MCAR assumption is reasonable then WLSMV should work well.

Take a look at

Section 3 in
http://statmodel.com/download/BayesAdvantages6.pdf

Section 3.4 in
http://statmodel.com/download/Imputations7.pdf

Also we are going to post a Technical Appendix on this topic in a week or so and we can send it to you tomorrow.

Thomas A. Schmitt posted on Thursday, August 12, 2010 - 7:51 pm

Thank you for the detailed response. Yes, I would appreciate it you could send me that Technical Appendix.

Jon Elhai posted on Thursday, August 19, 2010 - 2:24 pm

Tihomir,
I'm wondering if it's possible to briefly describe:
1) what is meant by "pairwise present" regarding how missing data are handled in WLSMV when there are no covariates,

2) in a nutshell how missing data are handled in this situation when there ARE covariates.

3) does the covariates situation in #2 above also apply if the covariates are not observed variables, but latent factors. In my case, I've got two measurement models, where a given CFA model's factors are allowed to correlate with the other CFA model's factors.

Linda K. Muthen posted on Thursday, August 19, 2010 - 4:13 pm

Pairwise present means that all available observations are used to estimate each correlation, that is, the sample size can vary for each correlation.

A technical appendix will be posted in about a week that describes missing data estimation with weighted least squares analysis.

ywang posted on Wednesday, February 09, 2011 - 12:37 pm

Dear Linda,

Is missing value the largest or the smallest value in Mplus? I am asking since I always generate new variables using "define". For example, for the following input, if whzw2 is missing, is overweight 1, 0 or missing?
if whzw2>1.0365 then overweight=1 ;
if whzw2<=1.0365 then overweight=0 ;

Thanks a lot for your help in advance!

Linda K. Muthen posted on Wednesday, February 09, 2011 - 12:50 pm

The value that is counted as missing is the value declared as a missing value flag using the MISSING option of the VARIABLE command. If this option is not used, no value is considered missing.

Rohini Sen posted on Tuesday, September 13, 2011 - 10:37 am

Hi,

I was wondering how long does an EFA (accounting for missing values) take in Mplus - 48 items, n=500? I set it to run about half hour ago and it's still running. I specified Type=Missing EFA as well...

Thank you very much in advance!

Bengt O. Muthen posted on Tuesday, September 13, 2011 - 11:23 am

It depends. Categorical variables with ML takes longer. With continuous variables, ML takes longer than ULS. More factors take longer. To inform you about specifications and timing, run just one factor as a first step.

Rohini Sen posted on Tuesday, September 13, 2011 - 11:54 am

Thank you!! It just finished running -but thank you for the prompt response -much appreciated.

David Bayliss posted on Sunday, February 16, 2014 - 1:05 am

Hello,

I am estimating an ordered-categorical CFA model with 8 factors each estimated by 6 observed variables. Each factor represents a repeated measure of the same variable across 8 waves (which I intend to develop into a multiple indicator linear growth model such as that in example 6.15 of the user guide). I have an unbalanced dataset as not all cases responded to every wave. I estimate the model using type=complex because it is survey data and so use a MLR estimator. In the resulting factor scores (obtained from save=fscores), a predicted score is given for all 8 waves for every case, creating a balanced dataset. I expected to see missing data where no observations were available to estimate the factor scores. My question is, am I specifying the model incorrectly if this is happening? And if this is meant to happen, is it ok to analyse these factor scores after removing those scores which are based on zero observations at a given wave?

Kind regards,
David

Linda K. Muthen posted on Sunday, February 16, 2014 - 11:17 am

The default in Mplus is to use all available information to estimate the model under FIML missing data theory. You can use these factor scores.