Johan Graus posted on Wednesday, October 15, 2014 - 6:12 am
In 'Confirmatory Factor Analysis for Applied Research', Timothy Brown (2006) advises against using ML estimators for ordinal indicator variables (such as Likert-type items). Instead, he recommends WLSMV. If I understand correctly, though, WLSMV handles missing data by deleting cases pairwise. The advantage of using ML (or MLR for non-normally distributed data) is that missing data are handled by a full information maximum likelihood (FIML) estimator, as a result of which the N does not decrease.
On this forum, however, I have read multiple times that ML (or MLR in case of non-normally distributed ordinal data) can be used for ordinal indicator variables -- with the advantage that missing data can be compensated for by FIML (i.e., by specifying 'TYPE=MISSING H1').
So my questions then are:
(1) Which estimator is recommended for handling ordinal indicators (assuming there are no missing data): ML (or MLR) or WLSMV?
(2) If you have missing data and you're concerned about sample size (i.e., you want to avoid pairwise deletion), which estimator is best for handling ordinal indicators: ML (or MLR) or WLMSV?
Perhaps Brown was referring to treating ordinal variables with ML that assumes they are continuous. Note that there is and ML fitting function for continuous variables and a different ML fitting function for categorical variables.
(1) I would use ML (or MLR) unless there are too many factors, making the computations slow due to too many dimensions of integration. Bayes is also an option.
(2) I would use ML (or MLR), or Bayes.
Johan Graus posted on Thursday, October 16, 2014 - 1:46 am
Brown (2006) is quite adamant about rejecting ML for use with ordinal indicators. Instead he advises using WLSMV. In case of missing (ordinal) data, his advice is to first use the multiple imputation capabilities of MPlus and then use the WLSMV estimator on thesedata sets ('TYPE = IMPUTATION').
(1) When you specify ESTIMATOR = ML and you also have categorical variables, which type of model is used? Is it ordered logistic regression?
(2) Would using multiple imputation and then doing a CFA with WLSMV estimator (in case of ordinal indicators and missing data) be a viable alternative to ML or MLR estimation? Which would you prefer and why?
(3) Does ML(R) or WLSMV with ordinal indicators provide goodness-of-fit indices? And do they provide standardized coefficients? And how should these standardized coefficients be interpreted in ML(R) or MLSMV with ordinal indicators?
(4) Is there are recent reference I can use for ML(R) estimation on ordinal indicators with missing data?
(5) Will specifying 'TYPE = MISSING H1' and using 'ESTIMATOR = ML(R)' (again, on ordinal indicators with missing data) result in direct ML handling the missing data?
Johan Graus posted on Thursday, October 16, 2014 - 3:37 am
And one additional question: I cannot find the option 'TYPE=MISSING H1' in the ANALYSIS chapter of the user's guide (chapter 16). However, in discussions on this forum, it is mentioned frequently (although I think mostly in older posts).
Is it still necessary to specify 'ANALYSIS: TYPE=MISSING H1' in a CFA model with ordinal indicators and missing data, using ML(R) as estimator and FIML as a way of handling the missing data? Or is it enough to specify that the estimator is ML, that there are CATEGORICAL indicators, and that there are missing data (MISSING ARE ALL (999))?
(1) The default is ordered logistic regression, but with link = probit you can also get ordered probit.
(2) Yes. I would do ML(R) if I can computationally (see my earlier answer), because it is a full-information estimator that handles missing data well. WLSMV is good with many factors and not too many variables. Bayes is good too because it is also a full-information estimator.
(3) ML(R) doesn't provide overall fit, but WLSMV does. Both give standardized.
(4) It is in the IRT literature and probably also in the Skrondal & Rabe-Hesketh 2004 book on latent variable modeling.
(5) That's an outdated setting for a time when missing data analysis and H1 computations were slow. Both are done as the default now.
Please limit your posts to one window as we request.
Johan Graus posted on Thursday, October 16, 2014 - 10:11 am
Thank you so much for your swift reply.
There's one last question that I have:
Following your advice would entail using ML(R) on the ordinal indicators. Since ML(R) is a full information estimator, missing data should be no problem.
The only thing I am wondering about is how to assess goodness-of-fit. I know that I can use loglikelihood, AIC, and BIC to compare models, but I can't seem to find goodness-of-fit indices in the output that give information about the model itself, such as chi squared, RMSEA, CFI/TFI etc.
Are these types of indices unavailable for ML(R) and ordinal indicators? And what can we use then to assess goodness of fit?
Two approaches can be used to assess the quality of the model with ML(R) for categorical.
First, you can use the TECH10 option in the OUTPUT command which gives you univariate and bivariate fit information.
Second, you can compare the logL and BIC of your model with that of a neighboring model that is a bit less restrictive.
Johan Graus posted on Thursday, October 16, 2014 - 12:27 pm
One final question:
Can the factor loadings in ML(R) estimation (CFA; with polytomous ordinal variables, i.e. logistic regression with numerical integration) be interpreted in the same way as factor loadings in WLSMV estimation? That is, in WLSMV estimation the input matrix for ordinal variables is a polychoric correlation matrix, assuming underlying latent continuous response variables, y*, which reflect the amount of an underlying continous and normally distributed characteristic. For instance, a completely standardized factor loading of .8 on Y1 (the first indicator in a set) in WLSMV estimation would indicate that if the latent factor would increase by 1 standard score, the the underlying response variable for Y1 (and not the observed indicator Y1 itself) would increase by .8 standard scores. So do factor loadings in ML(R) estimation use the same conceptual framework, or should they be interpreted differently.
As an aside, note that polychorics assume underlying normality so those correlations go with link=probit, not logit. But you can still think in terms of y*'s also for logit.
Po-Yi Chen posted on Sunday, April 10, 2016 - 2:51 pm
Dear Dr. Muthen,
I am now try to do nested CFA model comparisons with ordinal missing data (all of the indicators are four point Liker scale) in Mplus. I got following three questions that would like to have your help & guidance before formally conducting the tests if it is convenient for you
(1). I understand that by using Mplus, I can use WLSMV or ML(R) with categorical assumptions to handle the ordinal nature of indicators. In one of the FAQs document you wrote in 2015, I notice that you mentioned one advantage of ML with categorical assumption over WLSMV is that it can handle missing data better. I wondered is this because ML(R) will use FIML to handle missingness but WLMSV have to use pairwise deletion when calculating polycoric?
(2) Given missing data rate in my data is quite high (25%), I wondered will the chi-squared different test based on ML(R) categorical data be a better choice than DIFFTEST function in WLSMV?
(3) Last, to prevent MNAR, I would like to incorporate some auxiliary variables into my analysis, so I wondered in Mplus, with ML(R) with categorical assumption, is there any way to incorporate the auxiliary variables in to model like what Graham (2003) did with ML with continuous assumption?