I am comparing several randomized groups which were asked the same questions with different measurement techniques. I am trying to establish where the different techniques cause measurement bias using ordinal CFA models. There is a lot of debate between forward vs. backward testing procedures. My main question is about whether it is necessary to constrain thresholds to be equal across all groups (techniques) when assessing scale invariance.
In a forward selection procedure I would first assess loadings, while thresholds are free, then constrain loadings. Say I find thresholds are not equivalent; is it then still admissible to free thresholds as necessary and go on testing equality hypotheses on errors, factor variances (or in short: item reliability)?
In backward selection procedures I would constrain loadings and thresholds. Say thresholds are actually biased. Is it then admissible to first assess scale invariance regarding loadings and errors/factor variances (i.e. item reliabilities), even though the thresholds are wrongfully constrained to equal each other (and hence the constrained model lacks fit)?
My general worry is that if thresholds are free, group difference in constrained error or factor variance may be 'absorbed' by the free thresholds. I observe this in some of my analyses. On the other hand, if thresholds are wrongfully constrained, I wonder if tests of equality might get biased. -- Thanks
It's an interesting question and the answer depends on how general the IRT (categorical factor analysis) model is that you want to use.
Standard IRT does not have the generality of Mplus in allowing residual variances to be different across groups (fixing them to 1 in a reference group). Allowing the residual variances to be group-varying seems like a realistic need judging by multiple-group experience with continuous outcomes.
If you allow this flexibility using either Delta or Theta parameterization, the model with invariant loadings but non-invariant thresholds is not identified. This is different from the case of continuous outcomes. With categorical outcomes, a change in the residual variances (or the scale factors) can be absorbed by the group-specific factor variances and the group-specific thresholds.
The model with invariant loadings but non-invariant thresholds is identified when one specifies invariant residual variances (fixed at 1 using the Theta parameterization). So in that case one can study group differences in the factor (co-)variance. But I wonder if one can realistically assume from a substantive point of view that one is measuring the same factor in groups that have different thresholds.
This is why I tend to want to test invariance with respect to thresholds and loadings jointly when I work with categorical items. If you have reason to believe that thresholds are likely not invariant, I would try a modified version of threshold invariance such as what I mentioned earlier with a non-invariant item intercept and invariant thresholds.
In Version 7 there is also a Bayesian approach that builds on the "BSEM" paper on our web site, allowing small, ignorable non-invariance, where significant non-invariance can be detected for all parameters of all items in a single run.