I have 107,025 children's responses to standardized mathematics instruments administered in three waves (wave 1 - 37 items; wave 2 - 36 items; wave 3 - 31 items). I am conducting the analysis in the sequence: EFA, LCA, FMA, LTA. My ultimate concern is with model stability throughout the process because it seems to me that with each step in the process, the solution space inevitably increases.
I am concerned that the total possible response pattern space - the
distribution of possible response patterns - is much much larger than my
sample size (2^37 vs 2^16.076 (approx 107,025) for wave 1). In this
respect, my model may be mis-specified. I see one of three possible directions
to mitigate my unease.
1. Take a number of random samples (say 5,000) of the original linked
data-set and run parallel EFA, LCA, FMA, and LTA analyses. Bump up the BLRT
and LMRLRT random draws. Compare.
2. Reduce the number of items per wave to <= 16 (i.e., 2^16
= 65,536 which is < 107,025) run entire sample.
3. Reduce the number of items per wave to <= 16, take random samples (say 5,000) and compute estimates. Compare.
Could you please advise me as to which of these (or, perhaps a new one I