JPower posted on Tuesday, November 24, 2015 - 12:29 pm
I am using WLSMV estimation to conduct a confirmatory factor analyses on a measure (3 factors, 18 ordinal items) and am interesting in the stability of factor loadings and thresholds across two time points. In an initial model with all of the loadings and thresholds constrained, model fit indices suggest quite good fit (CFI=0.965, TLI=0.965, RMSEA=0.045 with p=1.000, WRMR=1.651). Modification indices suggest the freeing of specific loadings/thresholds. I then freed these one at a time (based on modification indices) and conducted chi-square difference tests at each step, which indicated significant improvements in fit.
My question is whether I have gone too far in freeing the constraints - have I over-fit the models by freeing the loadings/thresholds? Even though this improved fit, model fit was good with them all constrained. I guess I'm concern that I'm relying too much on chi-square estimates, which I believe may find small differences significant with larger sample sizes. My sample size is approximately 800. I am aware that there have been recommendations in the literature of using changes in RMSEA and CFI (Chen 2007), but these guidelines were not developed for categorical models/WLSMV estimation. In reading through the discussion board, I also understand that BIC is not an option with WLSMV. Thoughts?
Overfitting may occur due to chi-square sensitivity when n is large. You can see how much key parameters such as the factor mean difference over time is affected by freeing more measurement parameters. If chi-square drops significantly without such a key parameter changing in substantively meaningful ways you have demonstrated over-sensitivity.
JPower posted on Thursday, November 26, 2015 - 8:30 am
In terms of measurement invariance testing specifically, does this apply to DIFFTEST results as well? I am very concerned about concluding that the loadings and thresholds can' be constrained across time based on DIFFTEST results, when the model fit was already quite good to begin with. Are there any guidelines as to how much loadings or thresholds would need to change for it to be meaningful? Thanks!
Q2. No, but I recommend the sensitivity approach I mentioned. And note that good fit by e.g. CFI doesn't always protect you from important misfit that chi-square can detect.
Louise Black posted on Thursday, December 20, 2018 - 7:49 am
We are conducting invariance testing in a very large sample (> 13,000 longitudinal, > 30,000 group). We are using WLSMV since we have ordinal items and so plan to rely on difftest rather than CFI given that the chi-square is not comparable in the same way as for ML.
1. In relation to mean differences and overfitting (discussed above), would this be just for the second time point/group (since the mean would be fixed@0 in the first time point/group for identification)? 2. Would a difference in mean level of .011 upon freeing an itemís loadings and threshold parameters be indicative of overfitting/is there a threshold for this? 3. Given our sample size, we are considering using a random subsample to overcome oversensitivity- do you think this is a reasonable approach? Many thanks in advance!
2. It is the substantively important difference you want to focus on. Since you might not have a feel for a factor's values, you can translate your 0.011 to a key factor indicator mean (y-mean = intercept + loading*factormean).
3. I think there are different opinions about this - you may want to try asking on SEMNET.