 IRT concurrent calibration    Message/Author  jan mod posted on Thursday, June 18, 2015 - 3:57 pm
Dear Muthen & Muthen,

I'm doing an IRT model with two waves (mathematics, dichotomous variables) (single group design). I want to do a concurrent calibration with 10 common items and unique items for each wave. I want to put both waves on a common scale.
1) is this syntax correct?
2) How can I make an IRT calibrated scale for each wave math (two scores on common scale)?

DATA:
FILE = data.csv;

VARIABLE:
NAMES = math1-math10 math11-math20 math21-math45 ID;
USEVARIABLES = math1-math10 math11-math20 math21-math45;
MISSING = ALL(9);
IDVARIABLE = ID;

ANALYSIS:
ESTIMATOR = ML;

MODEL:
THETA by math1* math11-math20 math21-math45;
THETA@1;
[THETA@0];

PLOT:
TYPE = PLOT1 PLOT2 PLOT3;

SAVEDATA:
SAVE = FSCORES;
FILE = Equated.dat;

OUTPUT:
TECH1 TECH5 TECH8 TECH10;  Bengt O. Muthen posted on Thursday, June 18, 2015 - 6:21 pm
I would think you use 2 theta variables, one for each time point. And with its own indicators. So something like

theta1 by math1-math10
math111-math120* (a1-a20);
theta2 by math211-math220* (a1-a20);
math21-math45;
theta1-theta2@1;

where you have equalities a1-a20 for those items that are in common and administered at both time points (hence the prefix 1 and 2). The 2 theta variables will be correlated as the default. You ask for factor scores to get the individual theta values.

You also want to impose threshold invariance:

[math11\$1-math20\$1] (b2-b20);  jan mod posted on Friday, June 19, 2015 - 4:12 am
Thank you very much!

So this will do it for me: (including the imposition of threshold invariances):

DATA:
FILE = data.csv;

VARIABLE:
NAMES = math1-math10 math11-math20 math21-math45 ID;
USEVARIABLES = math1-math10 math11-math20 math21-math45;
MISSING = ALL(9);
IDVARIABLE = ID;

ANALYSIS:
ESTIMATOR = ML;

MODEL:
theta1 by math1-math10 !unique items wave 1
math111-math120* (a1-a10); !common items
theta2 by math211-math220* (a1-a10); !common items
math21-math45; !unique items wave 2
theta1-theta2@1;

[math111\$1-math120\$1] (b2-b12);
[math211\$1-math220\$1] (b12-b22);  jan mod posted on Friday, June 19, 2015 - 4:35 am
EDIT: Is this correct?  Bengt O. Muthen posted on Friday, June 19, 2015 - 5:10 pm
Almost. Change to (check details):

MODEL:
theta1 by math1-math10* !unique items wave 1
math111-math120* (a1-a10); !common items
theta2 by math211-math220* (a1-a10)
!common items
math21-math45*; !unique items wave 2
theta1-theta2@1;

[math111\$1-math120\$1] (b2-b12);
[math211\$1-math220\$1] (b12-b22);  Antonio A. Morgan-Lopez posted on Tuesday, January 22, 2019 - 8:18 am
I�m also working on a longitudinal IRT equating/calibration project, but it has a single group design. Forms A and B have the exact same items but A is self-report and B is from a clinical interview. Estimating item parameters for scoring a combined measure of A+B (my �gold standard�), calibrated A only (CA) and calibrated B only (CB). Looking at 1) testing for first- and second-order equity between CB and CB and 2) whether the treatment effect size for A+B is in contained the sample ES CI for CA and CB (coverage if A+B was the true value). The only way I could meet FOE/SOE and achieve d CI �coverage� was to impose N(0,1) combined across *all* timepoints � which is a very subtle-but-critical difference in parameterization from standard longitudinal IRT that it appears you�ve suggested here; if you have a reference for this, I�d love for you to share it. There isn�t anything formal in the equating literature on this (all the lit is cross-sectional) and in other literatures where there is an interest in longitudinal calibration (e.g., IDA), the standard M&V structure is always used; no one seems interested in equity in IDA because the interest is only in the combined measure.  Bengt O. Muthen posted on Tuesday, January 22, 2019 - 2:35 pm
If I understand what you are describing, I would not assume N(0,1) for the IRT factor but let its mean and variance vary across time.  Antonio A. Morgan-Lopez posted on Thursday, January 24, 2019 - 5:13 am
Hopefully I can clarify what I�ve done: I have u1-u34 measured at Time 1 and Time 2. u1-u17 is from Form A and u18-u34 are from Form B. I examine item-specific loading and threshold DIF over time and across reporters for each item (e.g., u1 = u18 but for different reporters�). I fit a final calibration model (with DIF where necessary) but also score across u1-u34 using standard structure (N(0,1) at T1, free M&V at T2). These are my �gold standard� (GS) scores. I then have two other scoring models: a) one where I used the item parameters estimates from GS as fixed parameters to score based only on u1-u17 and b) another scoring model where I score based only on u18-u34. The IRT linking literature suggests that the score distributions for the non-GS scores should be equivalent within sampling error under tests for first- and second-order equity (e.g., Hanson et al., 2001; Kim & DeCarlo, 2016). With a free M&V at T2, my scores fail these tests of equity. I then came across this thread above from 2015 and saw you had a constraint of N(0,1) across all timepoints and that actually worked for me.  Antonio A. Morgan-Lopez posted on Thursday, January 24, 2019 - 5:17 am
Refs are here:

Hanson, B. A., Harris, D. J., Pommerich, M., Sconing, J. A., & Yi, Q. (2001). Suggestions for the Evaluation and Use of Concordance Results. ACT Research Report Series. https://files.eric.ed.gov/fulltext/ED451243.pdf

Kim, Y., & DeCarlo, L. T. (2016). Evaluating Equity at the Local Level Using Bootstrap Tests. Research Report 2016-4. College Board. https://files.eric.ed.gov/fulltext/ED566890.pdf  Bengt O. Muthen posted on Friday, January 25, 2019 - 10:45 am
These questions are better suited for SEMNET.    Topics | Tree View | Search | Help/Instructions | Program Credits Administration