I am following advice on threshold identification in Millsap's 2011 book wabout measurement invariance testing. In Mplus I am using ordinal CFA models with WLSMV.
Millsap suggested that one should first free all thresholds except those necessary for identification and test for loading equivalence. For identification, it is recommended to constrain the thresholds of a reference threshold of each indicator equal across groups (plus a second threshold on one of the indicators; this applies to the cluster loading structure with more than two cateogries).
However, I noticed that depending on the choice of the reference threshold, the model can change substantially: for example estimates of loadings are affected by choice of the reference threshold, alongside the result of the invariance test on loadings. That is, under one threshold invariance constraint I find invariant loadings, but under a second configuration invariance is rejected. Moreover, estimates of the free thresholds (e.g. their order) are also affected.
Now I wonder how I should best identify thresholds to set equal for identification or whether there are alternative identification parameterizations I could use. It seems very disturbing to me that models are so easily influencable by these paramter constraints.
I should add that my main interest is understanding the differential use of categories across groups, whilst assuring equivalence of loadings. There is subtantial reason to believe that there is a lot of invariance, so a backward elimination procedure is not the ideal choice, I believe.
As an alternative parameterization I considered constraining the residual variances to 1 in all groups (Theta parameterization). Additionally the factor means are constrained to zero. Then all thresholds can be freed and are identified.
The interpretation of a plot of all thresholds is meaningful, after assuring loading invariance. It also avoids constraining a threshold equal on which there is bias. But I am wondering if this is a useful approach or if constraining error variances to 1 can have effects that I am not aware of. My understanding is that group differences in the residuals would then be absorbed by the loadings, so could be tested in the 'first' invariance testing step.
Just to let you know that I am thinking about an answer to your item invariance questions.
Let ask you how many items and how many groups you have.
Also, you said earlier that "there is a lot of invariance, so a backward elimination procedure is not the ideal choice". I thought the term "backward" meant that you start with the fully invariant model, which seems like the starting point closest to the model you believe in given your phrase "a lot of invariance".
Thanks -- I mixed up the words invariance and bias/DIF here. So the sentence should read "there is a lot of item bias, so a backward elimination procedure is not the ideal choice".
Then starting with a backward procedure (fully invariant model) will mean having to modify a lot of parameters, if MI at all indicate the correct spots of misfit. In my tests with backward the options for modification quickly ran out or were implausible.
I think forward procedures might point rather to the sources of measurement bias. Perhaps the correct procedure is ongoing debate. But here I get the "choice of minimal identification constraints" problem.
In my analysis I am interested in various scales (one with 2 factors and six items, two with one factor and four items; always five categories). I always have four groups.
Because you expect so much non-invariance in the thresholds, I wonder if it would be of any help to use a model with the "nu" intercepts in addition to the thresholds. That's an item-specific intercept. For instance, you could start with a model with invariant thresholds, but non-invariant intercepts. This means a rigid shift of all thresholds across groups for an item. That's different than the shift you get by different factor means across groups because it is an item-specific shift. Perhaps MIs are easier to work with in such a more flexible model as a starting point. Identification has to handled carefully, where a reference group has both the factor mean and all intercepts fixed at zero. Haven't tried this, so take the suggestion with a grain of salt. Because the nu intercepts don't exist in Mplus with categorical, the way you add them is to create perfectly measured factors behind the items.
In your post above you say that one can create intercepts with categorical by ceating "perfectly measured factors behind the items". Not sure how to do this...Can you check my example or give an own one please?
Say x1-x3 are categorical items:
f1 by item1-item3; item1 by x1; x1@0; item2 by x2; x2@0; item3 by x3; x3@0;
Additionally one should set all thresholds equal in some way, I suppose.
Tait Medina posted on Wednesday, November 20, 2013 - 10:14 am
Hi Thomas. I am wondering if you were able to successfully introduce a perfectly measured factor behind each y*? If yes, would you mind sharing your syntax here? Also, did you find such an approach useful?
Thank you, Tait
Tait Medina posted on Friday, November 22, 2013 - 10:36 am
I am wondering if you can point me in the right direction. I am having difficulty understanding why residual variances can be freely estimated (in non-reference groups) only when thresholds are constrained to be equivalent across groups (in addition to setting the metric by either fixing the variance to one in all groups or fixing the first loading to one in all groups). I've read through Web Note 4, but am still having a hard time getting my head around this. Do you have another reading that might help me understand this.
I have written about this as follows, also giving references to a paper and a 2011 book by Millsap:
"It is of interest to understand if factor loading invariance can be tested separately from threshold invariance. In the case of continuous outcomes, invariance of factor loadings make it possible to identify and estimate factor variance-covariances in the different groups, while intercept invariance is necessary only for identifying and estimating factor means in the different groups. This holds even if there is residual variance noninvariance because the residual variances do not influence the conditional expectation function of the outcome given the factors. In contrast, with binary outcomes, the residual variances do influence the conditional expectation function, that is, the item characteristic curve. In the binary case, a model with non- invariant thresholds is not identified when allowing group-varying residual variances. To see the indeterminacies, consider again multiplying all scale factors by the same constant in a certain group. This change can be absorbed in the factor variance as before and in the thresholds as seen in (1.49) and (1.50). This implies that threshold invariance and factor loading invariance cannot be separately tested in the binary case without further restrictions, one case being residual variance invariance (see also Millsap & Tien, 2004 and Millsap, 2011). Muthen and Asparouhov (2002) discuss further identification and testing matters for multiple-group analysis and show the equivalent issues for invariance across time in growth models.
In the polytomous case, each item has more than one threshold and the identification status is different from the binary case. Millsap and Tien (2004) and Millsap (2011) give identification rules for invariance restrictions on model parameters. As these authors show, it is possible to identify non- invariant factor loadings in conjunction with a minimal set of restrictions on the thresholds, while at the same time allowing group-varying factor means, factor variances, and residual variances."
Tait Medina posted on Monday, December 02, 2013 - 7:45 am
Dr. Muthen, this is very helpful and thank you for taking the time to address my question.
Thanks again for your quick response, Dr. Muthen. I have an additional follow-up question.
I attempted to remove invariance constraints on the thresholds by removing the labels on those parameters for both groups (black and white), but that model was not identified. I then removed the labels for just one group (black) and the thresholds were estimated, as I had hoped.
Even though it is the result I am seeking, I'm unsure why removing them from only one group leads both groups' thresholds being estimated. Can you explain why the labeling works that way?