We have a hierarchical CFA model with 67 categorical indicators, 6 group factors, and a hierarchical general factor. This model fits very well in our Time 1 data. Unfortunately, due to the large nature of the combined model (134 observed categorical variables), our sample size is too small to enable us to estimate invariance in the usual manner.
We have an alternate strategy and were wondering if it seems valid. Instead of testing the structures together in the same model, we would validate the Time 1 structure in data at Time 2, by first examining configural invariance (how well does the Time 1 derived structure fit in Time 2 data when thresholds and factor loadings are freely estimated). Next, we would examine metric invariance by using Time 2 data, but fixing factor loadings and thresholds equal to values obtained in Time 1 data and examining model fit (as well as chi-square difference test of the configural invariant model versus the metric invariant model to determine if there is a significant difference in fit). If there were areas of strain, we could release some of these constraints and test for partial metric invariance.
Does this sound like a reasonable strategy for testing temporal invariance of our factor structure? Thank you for your thoughts.
Thank you for your reply. I conducted the analysis I had mentioned and was surprised to find that the metric invariant model (where I fixed all factor loadings and thresholds) actually had a LOWER chi-square value than the configural invariant model (where factor loadings and thresholds were free to vary). Chi-square dropped from 324 (with a CFI of 0.95 and RMSEA of 0.049) to 235 (CFI=0.96, RMSEA=0.057). It is just difficult to understand how fixing these loadings/thresholds results in a LOWER chi-squared value.
Also, the df only dropped from 169 (configural invariant) to 107 (metric invariant) even though I fixed 149 factor loadings (as well as many thresholds). Is this a function of how df are calculated in WLSMV?
Sorry - I should have mentioned that I did use DIFFTEST and the result was significant (value=108, df=43, p=0.000). Since the chi-square value was reduced from the less restrictive model (loadings and thresholds free to vary) to the more restrictive model (fixed loadings and thresholds), I assumed the DIFFTEST result indicated a significant IMPROVEMENT in fit when the pathways were fixed. However, this is counter to what I expected, and what is usually seen: fixing (or equating) pathways typically results in a decrement in model fit.
So, are you indicating that despite the apparent DROP in chi-square (of about 90), the DIFFTEST value is indicating an INCREASE in chi-square (of 108) and the metric invariant model represents a significant decrement in model fit as we would expect? Or, is the DIFFTEST indicating the metric invariant model is a sig improvement (since visual inspection reveals chi-square dropping from the configural to metric invariant model, and the value of this drop is similar in magnitude to the DIFFTEST value)? Your help in interpreting these unexpected results is much appreciated! Thank you!
With WLSMV you should not be making any interpretation based on the chi-square values and degrees of freedom including those given in the DIFFTEST results. It is only the p-values you should be looking at. If the p-value for the DIFFTEST results is less than .05, it means that the more restrictive model significantly worsens the fit of the model. If the p-value is greater than .05, it means that the more restrictive model does not significantly worsen the fit of the model.
Great - thank you again for your speedy reply. I had a question with regards to reporting results from the DIFFTEST. In a manuscript would DIFFTEST results more accurately be expressed as delta chi-square or simply chi-square?
I have a different wrinkle on testing temporal invariance that I was hoping for advice on.
We have two waves of panel data, where the same participants were surveyed at each wave but different indicators are available at Time 1 and Time 2.
So, I think it might be technically possible to assess whether temporal invariance holds, using only those indicators that are repeated at Time 1 and Time 2.
However, I'm not sure if this type of temporal invariance analysis tells you anything - does temporal invariance using a subset of indicators provide some evidence of temporal stability over time, but less evidence than the same set of indicators from the same participants at Time 1 and Time 2.
Or, is it simply not appropriate to assess temporal invariance for only a subset of indicators?
I would leave the data in the wide format. See the Topic 4 short course handout on the website. Multiple indicator growth is described starting on Slide 77. The first part of the example illustrates testing measurement invariance across time.