JPower posted on Tuesday, November 24, 2015 - 12:29 pm
I am using WLSMV estimation to conduct a confirmatory factor analyses on a measure (3 factors, 18 ordinal items) and am interesting in the stability of factor loadings and thresholds across two time points. In an initial model with all of the loadings and thresholds constrained, model fit indices suggest quite good fit (CFI=0.965, TLI=0.965, RMSEA=0.045 with p=1.000, WRMR=1.651). Modification indices suggest the freeing of specific loadings/thresholds. I then freed these one at a time (based on modification indices) and conducted chi-square difference tests at each step, which indicated significant improvements in fit.
My question is whether I have gone too far in freeing the constraints - have I over-fit the models by freeing the loadings/thresholds? Even though this improved fit, model fit was good with them all constrained. I guess I'm concern that I'm relying too much on chi-square estimates, which I believe may find small differences significant with larger sample sizes. My sample size is approximately 800. I am aware that there have been recommendations in the literature of using changes in RMSEA and CFI (Chen 2007), but these guidelines were not developed for categorical models/WLSMV estimation. In reading through the discussion board, I also understand that BIC is not an option with WLSMV. Thoughts?
Overfitting may occur due to chi-square sensitivity when n is large. You can see how much key parameters such as the factor mean difference over time is affected by freeing more measurement parameters. If chi-square drops significantly without such a key parameter changing in substantively meaningful ways you have demonstrated over-sensitivity.
JPower posted on Thursday, November 26, 2015 - 8:30 am
In terms of measurement invariance testing specifically, does this apply to DIFFTEST results as well? I am very concerned about concluding that the loadings and thresholds can' be constrained across time based on DIFFTEST results, when the model fit was already quite good to begin with. Are there any guidelines as to how much loadings or thresholds would need to change for it to be meaningful? Thanks!