Anh Hua posted on Friday, August 09, 2019 - 7:23 pm
I have been trying to fit a cfa model on a likert scale, which has 5 categories, with over 50 items and 8 latent factors using a sample size of over 28,000. I used WLSMV as the estimator as this has been recommended by several SEM experts for CFAs with ordinal data.
In my first preliminary round of CFAs, I used listwise deletion, which dropped my sample size down by 20%. This is a lot of missing data but in the end, it's okay because my sample size is still quite large. However, listwise deletion is not ideal as my missing data are NOT MCAR.
Unfortunately, according to Timothy Brown, there’re issues with pairwise deletion too (e.g., if data are MAR, parameter estimates as well as standard errors are severely biased, etc. ).
So my questions are: (1) Is WLSMV still the best estimator given the nature of the missing data I have in my sample? (2) If not, is there another estimator that can handle ordinal data with 5 likert categories and missing data well? (3) If there’s no other estimator that can handle ordinal data like WLSMV, do people recommend data imputation as the next step, and which imputation method?
I hope to not having to impute any data, but I'll do whatever necessary to get this right. Thank you so much for your input.
You can use Estimator = Bayes which is just as good as ML in handling missing data. Unlike ML, which requires numerical integration with categorical outcomes, it can also handle many factors.
Anh Hua posted on Monday, August 12, 2019 - 7:10 am
Hi Dr. Muthen,
This is a great suggestion. Unfortunately, I know nothing about Bayesian statistics and would not know how to evaluate a CFA model based on this type of estimation. I will need to look into this in the near future.
In the meantime, would you recommend then that I do some type of multiple imputation? If so, do you think that I need to test for MCAR, MAR, and MNAR first? I read somewhere that testing for MCAR is fraught with problems, and it's not really posisible to test for MAR nor MNAR either (!), so perhaps it's okay to go ahead and do multiple imputation on my dataset and hope that it will still give me less biased results than my first approach, which was listwise deletion. Do you think this is a viable approach?