Message/Author |
|
jan mod posted on Thursday, June 18, 2015 - 3:57 pm
|
|
|
Dear Muthen & Muthen, I'm doing an IRT model with two waves (mathematics, dichotomous variables) (single group design). I want to do a concurrent calibration with 10 common items and unique items for each wave. I want to put both waves on a common scale. 1) is this syntax correct? 2) How can I make an IRT calibrated scale for each wave math (two scores on common scale)? DATA: FILE = data.csv; VARIABLE: NAMES = math1-math10 math11-math20 math21-math45 ID; USEVARIABLES = math1-math10 math11-math20 math21-math45; MISSING = ALL(9); IDVARIABLE = ID; ANALYSIS: ESTIMATOR = ML; MODEL: THETA by math1* math11-math20 math21-math45; THETA@1; [THETA@0]; PLOT: TYPE = PLOT1 PLOT2 PLOT3; SAVEDATA: SAVE = FSCORES; FILE = Equated.dat; OUTPUT: TECH1 TECH5 TECH8 TECH10; |
|
|
I would think you use 2 theta variables, one for each time point. And with its own indicators. So something like theta1 by math1-math10 math111-math120* (a1-a20); theta2 by math211-math220* (a1-a20); math21-math45; theta1-theta2@1; where you have equalities a1-a20 for those items that are in common and administered at both time points (hence the prefix 1 and 2). The 2 theta variables will be correlated as the default. You ask for factor scores to get the individual theta values. You also want to impose threshold invariance: [math11$1-math20$1] (b2-b20); |
|
jan mod posted on Friday, June 19, 2015 - 4:12 am
|
|
|
Thank you very much! So this will do it for me: (including the imposition of threshold invariances): DATA: FILE = data.csv; VARIABLE: NAMES = math1-math10 math11-math20 math21-math45 ID; USEVARIABLES = math1-math10 math11-math20 math21-math45; MISSING = ALL(9); IDVARIABLE = ID; ANALYSIS: ESTIMATOR = ML; MODEL: theta1 by math1-math10 !unique items wave 1 math111-math120* (a1-a10); !common items theta2 by math211-math220* (a1-a10); !common items math21-math45; !unique items wave 2 theta1-theta2@1; [math111$1-math120$1] (b2-b12); [math211$1-math220$1] (b12-b22); |
|
jan mod posted on Friday, June 19, 2015 - 4:35 am
|
|
|
EDIT: Is this correct? |
|
|
Almost. Change to (check details): MODEL: theta1 by math1-math10* !unique items wave 1 math111-math120* (a1-a10); !common items theta2 by math211-math220* (a1-a10) !common items math21-math45*; !unique items wave 2 theta1-theta2@1; [math111$1-math120$1] (b2-b12); [math211$1-math220$1] (b12-b22); |
|
|
I’m also working on a longitudinal IRT equating/calibration project, but it has a single group design. Forms A and B have the exact same items but A is self-report and B is from a clinical interview. Estimating item parameters for scoring a combined measure of A+B (my “gold standard”), calibrated A only (CA) and calibrated B only (CB). Looking at 1) testing for first- and second-order equity between CB and CB and 2) whether the treatment effect size for A+B is in contained the sample ES CI for CA and CB (coverage if A+B was the true value). The only way I could meet FOE/SOE and achieve d CI “coverage” was to impose N(0,1) combined across *all* timepoints – which is a very subtle-but-critical difference in parameterization from standard longitudinal IRT that it appears you’ve suggested here; if you have a reference for this, I’d love for you to share it. There isn’t anything formal in the equating literature on this (all the lit is cross-sectional) and in other literatures where there is an interest in longitudinal calibration (e.g., IDA), the standard M&V structure is always used; no one seems interested in equity in IDA because the interest is only in the combined measure. |
|
|
If I understand what you are describing, I would not assume N(0,1) for the IRT factor but let its mean and variance vary across time. |
|
|
Hopefully I can clarify what I’ve done: I have u1-u34 measured at Time 1 and Time 2. u1-u17 is from Form A and u18-u34 are from Form B. I examine item-specific loading and threshold DIF over time and across reporters for each item (e.g., u1 = u18 but for different reporters…). I fit a final calibration model (with DIF where necessary) but also score across u1-u34 using standard structure (N(0,1) at T1, free M&V at T2). These are my “gold standard” (GS) scores. I then have two other scoring models: a) one where I used the item parameters estimates from GS as fixed parameters to score based only on u1-u17 and b) another scoring model where I score based only on u18-u34. The IRT linking literature suggests that the score distributions for the non-GS scores should be equivalent within sampling error under tests for first- and second-order equity (e.g., Hanson et al., 2001; Kim & DeCarlo, 2016). With a free M&V at T2, my scores fail these tests of equity. I then came across this thread above from 2015 and saw you had a constraint of N(0,1) across all timepoints and that actually worked for me. |
|
|
Refs are here: Hanson, B. A., Harris, D. J., Pommerich, M., Sconing, J. A., & Yi, Q. (2001). Suggestions for the Evaluation and Use of Concordance Results. ACT Research Report Series. https://files.eric.ed.gov/fulltext/ED451243.pdf Kim, Y., & DeCarlo, L. T. (2016). Evaluating Equity at the Local Level Using Bootstrap Tests. Research Report 2016-4. College Board. https://files.eric.ed.gov/fulltext/ED566890.pdf |
|
|
These questions are better suited for SEMNET. |
|
Back to top |