For binary data are thresholds equal to intercepts for continuous data? If so, in a CFA multi-group factor analysis, given that factor loadings and thresholds must be held equal in tandem, is testing for invariance of the factor loadings across groups equivalent to testing for both metric invariance (factor loadings) and scalar invariance (thresholds/intercepts) at the same time?
I am assuming that I use the MI titled "Means/Intercepts/Thresholds". The output gives a MI for [item1] and a MI for [item1$1]. I am assuming the first is the MI for the factor and the other is the MI for the threshold, for the same item. The value for both the factor and threshold is the same. How would you determine if it is the factor or the threshold that is non-invariant across groups from this info (see comment above April 16).
Also, would the list of MI include all factor/thresholds that are contributing to the non-invariance? For example, for one analysis the MI output listed two items under "Means/Intercepts/Thresholds", but when I tested for non-invariance for each individual item (DIF)I found five items that were non-invariant across groups. Why the discrepancy (i.e., two versus five)? I was wondering if I needed to do some type of correction to account for the number of tests of invariance I conducted, i.e. tested three factors and 35 items. That might reduce the number of significant findings.
One additional question ... I have one item in the MI output that has 15 or 16 suggested correlations ("ON" and "WITH") with other items and factors that would reduce chi-square, and I was wondering if so many suggested correlations and cross-loadings indicated that there was an issue with that particular item? What does that say about the item?
[item1] refers to the intercept (nu) which is fixed at zero for categorical items.
[item1$1] refers to the threshold. Thresholds are the same as intercepts except with opposite signs.
Different models have different MIs. Also, note that only MI's > 3.84 are printed as the default.
Yes, an item with many MI correlations certainly flags a problem with that item. Perhaps the item content contains all those other items (a "global" item)?
Jason Bond posted on Friday, April 13, 2012 - 11:34 pm
In doing an IRT of 7 variables (3 categories each), with the default settings (which I believe are a normal o-give model with WLS) when I look at each of the 2 estimated thresholds per variable in the output, the first is always smaller in magnitude than the second. I assume this is an artifact of the probit model for ordinal polytomous regression. However, when I look at the ICC curves I see that, for a majority of the variables, the value of the factor at which the probability curves for categories 1 and 2 cross (I assume this 'difficulty' value is the severity where a jump between adjacent categories is equally likely and computed as the corresponding threshold/loading) is larger than the severity value at which the probability curves for categories 2 and 3 cross. As higher response categories are intended to be indications of higher severity for all variables for which this occurs, this is puzzling. Is there an easy explanation for this? It is true that, for many of these, the prevalence of the 2nd category is less than that of the third. Thanks.
I don't think you should focus on the crossings, but rather the factor value at the peak of each category's probability curve. Those peaks should be ordered according to the category ordering.
Jason Bond posted on Monday, April 16, 2012 - 7:22 pm
The peaks of the ICCs for the categories are indeed ordered by the threshold values. Is there then information in these curves regarding when and how combining of categories is appropriate? For example, for any given item, the ICC curve for the first and last category intersect at a given factor value but the peaks of the probability curves for all the other categories in between never rise to the level of intersection of the probability of the first and last. Say an item had 5 categories at that the peaks for the 3 intermediate categories occurred at factor values that were on different sides of the factor value where the probability of the first and last category intersected. Could one then use this as a rule to decide which categories were combined? Thanks again,
If the three categories in the middle have low and similar peaks, this suggests they can be collapsed.
Jason Bond posted on Monday, May 07, 2012 - 5:28 pm
A follow-up question regarding your response on April 15th to my initial posting. I'm trying to reconcile the threshold estimates to physical characteristics of the ICC curves...my initial post to you was that I thought they corresponded to where the probability category curves for adjacent categories crossed...but your response and the fact that the crossings aren't ordered ruled that out. You mentioned that the peaks should be ordered by the threshold values, and indeed they are..but given there are k-2 peaks (as the first and last categories don't have them) and only k-1 thresholds, I've clearly got something wrong. Is there an actual probability curve characteristic that corresponds to the values of the estimated threshold parameters? Thanks much again in advance,
There may be, but I don't focus on that in my work so I don't recall it. Why don't you take a look at for instance the IRT book by Baker and Kim or by Reckase.
Jason Bond posted on Wednesday, May 09, 2012 - 5:31 pm
Thanks much for the reference. One more reference request, if you know of one. I have a number of groups (data from different countries) each of which have the seven AUDIT 5-category alcohol problems...do you know of anyone who has used the multilevel functionality of Mplus in the context of a polytomous IRT for a DIF application paper? I've looked but not found anything satisfactory. Thanks again,
I have started to notice an increased interest in multilevel, multi-group analysis with categorical items using Mplus, but I can't recall seeing anything published yet on that.
Jason Bond posted on Wednesday, May 09, 2012 - 9:05 pm
I have 17 countries...but it might make more sense to only look at DIF for meaningful subgroups of countries. Might you know of any references where anyone has looked at DIF when a moderate number of groups are used (I only recall having found analyses with G=2). Thanks,
Off hand I can't think of any such references, except my 1989 Psychometrika article talked about using several covariates in a MIMIC model to study DIF across many groups with binary items.
Does anybody know of other such references for categorical items?
Ping Kuo posted on Friday, September 04, 2015 - 1:12 pm
Hello, I wonder whether my interpretations of changes of thresholds across two time points are correct. Thresholds of all items at Time 2 are higher than those at Time 1. Take item 1 for example, three thresholds of item 1 are .017, .92, 1.428 at Time 1, but they are .460, 1.52, 2.129 at time 2. Could I interpret the observed scores of this measure are underestimated at Time 2 (because all items become more difficult at time 2 and participants need more latent traits to endorse the same category of items at time 2)? Thanks a lot.
No, you need a well-fitting model to be able to make t6hose statements. You can't talk about "more latent traits" accounting for the change, nor can you talk about underestimation. All you can say is that the percentages of the observed variable have changed in a certain direction.