I am trying to describe and illustrate current similarities and differences between binary CFA and IRT for my thesis. The default estimation method in Mplus for categorical CFA is WLSMV. To run an IRT model, the example in your manual suggests to use MLR as the estimation method. When I use MLR, is the data input still the tetrachoric correlation matrix or is the original response data matrix used?
I don't think there is a difference between CFA of categorical variables and IRT. It is sometimes claimed but I don't agree. Which estimator is typically used may differ, but that's not essential. MLR uses the raw data, not a sample tetrachoric correlation matrix.
Thank you for your quick response. I agree, the differences do not seem fundamental, more traditional and they are disappearing, which doesn't make it easier to describe.
Some people say the difference is where the marginalization occurs. I'm in the process of understanding this. Since ML(R) uses the raw response data, is the marginalization done on the observed categories (instead of the latent response variables)? And is the integration over the number of factors (instead of the number of items)?
Can I ask for a reference to ML and MLR estimation of categorical CFA in Mplus?
The ML(R) approach is the same as the "marginal ML (MML)" approach described in e.g. Bock's work. So using the raw data and integrating over the factors using numerical integration. MML being contrasted with "conditional ML" used e.g. with Rasch approaches.
Assuming normal factors, probit (normal ogive) item-factor relations, and conditional independence, the assumptions are the same for ML and for WLSMV, where the latter uses tetrachorics. This is because those assumptions correspond to assuming multivariate normal underlying continuous latent response variables behind the categorical outcomes. So WLSMV only uses 1st- and 2nd-order information, whereas ML goes all the way up to the highest order. The loss of info appears small, however. ML doesn't fit the model to these sample tetrachorics, so perhaps one can say that WLSMV marginalizes in a different way. It's a matter of estimator differences rather than model differences.
Thank you, that is very helpful just like many other subjects on the discussion board. I think the pieces are falling into place now.
One very final question. I have done a simple simulation study. I've generated 10 data sets with 10 items and one factor. I have analyzed these with ACER's ConQuest software for Rasch models (which also applies MML/EM). I'm able to transform the Rasch parameters into what I call typical IRT parameters (by transforming the variance of the latent factor to 1 and using the logit/probit approximation). Then I can transform these typical difficulties and discrimination into thresholds and a loading estimate (loadings are constrained to be equal).
The loading, thresholds, typical IRT discrimination and difficulties are very close to the Mplus results using the WLSMV. The IRT parameters are almost identical when I use ML(R). However, the loading and thresholds are about 2.25 times larger using ML(R) than using WLSMV.
The variance of the factor is 1 in both instances, but the loading in the ML(R) case is larger than 1. (I'm using Mplus 5.)
I am confused by your second to last paragraph. First you say that the IRT par's are close to WLSMV. Then you say the IRT par's are almost the same as ML. And then you seem to contradict the first 2 sentences by saying that WLSMV is different from ML. Please clarify.
Sorry for the confusion. Here a more structured description:
From ConQuest estimates, I compute (1) typical IRT parameter estimates (discrimination and difficulties) and (2) CFA parameter estimates (loading and thresholds).
When I use WLSMV in Mplus, both (1) and (2) from the Mplus output are very close to my transformed ConQuest estimates.
When I use MLR in Mlus, (1) is identical to my transformed ConQuest estimates, but (2) is about 2.25 times larger. The loading is bigger than 1 (about 1.4), while it is about 0.6 when I use WLSMV and ConQuest. The thresholds differ by the same factor.
The variance of the factor is fixed to 1 and the loadings constrained to be equal in all instances above.
I see now that lambda(MLR)=1.7*alpha(MLR), so it has something to do with the logit-probit approximation. I think I can work it out and I'll check results with Mplus 6.12 first.