Message/Author 

jan mod posted on Thursday, June 18, 2015  3:57 pm



Dear Muthen & Muthen, I'm doing an IRT model with two waves (mathematics, dichotomous variables) (single group design). I want to do a concurrent calibration with 10 common items and unique items for each wave. I want to put both waves on a common scale. 1) is this syntax correct? 2) How can I make an IRT calibrated scale for each wave math (two scores on common scale)? DATA: FILE = data.csv; VARIABLE: NAMES = math1math10 math11math20 math21math45 ID; USEVARIABLES = math1math10 math11math20 math21math45; MISSING = ALL(9); IDVARIABLE = ID; ANALYSIS: ESTIMATOR = ML; MODEL: THETA by math1* math11math20 math21math45; THETA@1; [THETA@0]; PLOT: TYPE = PLOT1 PLOT2 PLOT3; SAVEDATA: SAVE = FSCORES; FILE = Equated.dat; OUTPUT: TECH1 TECH5 TECH8 TECH10; 


I would think you use 2 theta variables, one for each time point. And with its own indicators. So something like theta1 by math1math10 math111math120* (a1a20); theta2 by math211math220* (a1a20); math21math45; theta1theta2@1; where you have equalities a1a20 for those items that are in common and administered at both time points (hence the prefix 1 and 2). The 2 theta variables will be correlated as the default. You ask for factor scores to get the individual theta values. You also want to impose threshold invariance: [math11$1math20$1] (b2b20); 

jan mod posted on Friday, June 19, 2015  4:12 am



Thank you very much! So this will do it for me: (including the imposition of threshold invariances): DATA: FILE = data.csv; VARIABLE: NAMES = math1math10 math11math20 math21math45 ID; USEVARIABLES = math1math10 math11math20 math21math45; MISSING = ALL(9); IDVARIABLE = ID; ANALYSIS: ESTIMATOR = ML; MODEL: theta1 by math1math10 !unique items wave 1 math111math120* (a1a10); !common items theta2 by math211math220* (a1a10); !common items math21math45; !unique items wave 2 theta1theta2@1; [math111$1math120$1] (b2b12); [math211$1math220$1] (b12b22); 

jan mod posted on Friday, June 19, 2015  4:35 am



EDIT: Is this correct? 


Almost. Change to (check details): MODEL: theta1 by math1math10* !unique items wave 1 math111math120* (a1a10); !common items theta2 by math211math220* (a1a10) !common items math21math45*; !unique items wave 2 theta1theta2@1; [math111$1math120$1] (b2b12); [math211$1math220$1] (b12b22); 


I’m also working on a longitudinal IRT equating/calibration project, but it has a single group design. Forms A and B have the exact same items but A is selfreport and B is from a clinical interview. Estimating item parameters for scoring a combined measure of A+B (my “gold standard”), calibrated A only (CA) and calibrated B only (CB). Looking at 1) testing for first and secondorder equity between CB and CB and 2) whether the treatment effect size for A+B is in contained the sample ES CI for CA and CB (coverage if A+B was the true value). The only way I could meet FOE/SOE and achieve d CI “coverage” was to impose N(0,1) combined across *all* timepoints – which is a very subtlebutcritical difference in parameterization from standard longitudinal IRT that it appears you’ve suggested here; if you have a reference for this, I’d love for you to share it. There isn’t anything formal in the equating literature on this (all the lit is crosssectional) and in other literatures where there is an interest in longitudinal calibration (e.g., IDA), the standard M&V structure is always used; no one seems interested in equity in IDA because the interest is only in the combined measure. 


If I understand what you are describing, I would not assume N(0,1) for the IRT factor but let its mean and variance vary across time. 


Hopefully I can clarify what I’ve done: I have u1u34 measured at Time 1 and Time 2. u1u17 is from Form A and u18u34 are from Form B. I examine itemspecific loading and threshold DIF over time and across reporters for each item (e.g., u1 = u18 but for different reporters…). I fit a final calibration model (with DIF where necessary) but also score across u1u34 using standard structure (N(0,1) at T1, free M&V at T2). These are my “gold standard” (GS) scores. I then have two other scoring models: a) one where I used the item parameters estimates from GS as fixed parameters to score based only on u1u17 and b) another scoring model where I score based only on u18u34. The IRT linking literature suggests that the score distributions for the nonGS scores should be equivalent within sampling error under tests for first and secondorder equity (e.g., Hanson et al., 2001; Kim & DeCarlo, 2016). With a free M&V at T2, my scores fail these tests of equity. I then came across this thread above from 2015 and saw you had a constraint of N(0,1) across all timepoints and that actually worked for me. 


Refs are here: Hanson, B. A., Harris, D. J., Pommerich, M., Sconing, J. A., & Yi, Q. (2001). Suggestions for the Evaluation and Use of Concordance Results. ACT Research Report Series. https://files.eric.ed.gov/fulltext/ED451243.pdf Kim, Y., & DeCarlo, L. T. (2016). Evaluating Equity at the Local Level Using Bootstrap Tests. Research Report 20164. College Board. https://files.eric.ed.gov/fulltext/ED566890.pdf 


These questions are better suited for SEMNET. 

Back to top 