Message/Author 


We have a hierarchical CFA model with 67 categorical indicators, 6 group factors, and a hierarchical general factor. This model fits very well in our Time 1 data. Unfortunately, due to the large nature of the combined model (134 observed categorical variables), our sample size is too small to enable us to estimate invariance in the usual manner. We have an alternate strategy and were wondering if it seems valid. Instead of testing the structures together in the same model, we would validate the Time 1 structure in data at Time 2, by first examining configural invariance (how well does the Time 1 derived structure fit in Time 2 data when thresholds and factor loadings are freely estimated). Next, we would examine metric invariance by using Time 2 data, but fixing factor loadings and thresholds equal to values obtained in Time 1 data and examining model fit (as well as chisquare difference test of the configural invariant model versus the metric invariant model to determine if there is a significant difference in fit). If there were areas of strain, we could release some of these constraints and test for partial metric invariance. Does this sound like a reasonable strategy for testing temporal invariance of our factor structure? Thank you for your thoughts. 


I don't think you can do better than what you suggest. You might also do it in the opposite direction. 


Linda, Thank you for your reply. I conducted the analysis I had mentioned and was surprised to find that the metric invariant model (where I fixed all factor loadings and thresholds) actually had a LOWER chisquare value than the configural invariant model (where factor loadings and thresholds were free to vary). Chisquare dropped from 324 (with a CFI of 0.95 and RMSEA of 0.049) to 235 (CFI=0.96, RMSEA=0.057). It is just difficult to understand how fixing these loadings/thresholds results in a LOWER chisquared value. Also, the df only dropped from 169 (configural invariant) to 107 (metric invariant) even though I fixed 149 factor loadings (as well as many thresholds). Is this a function of how df are calculated in WLSMV? Thank you for your help, Jason 


You cannot compare the chisquare and degrees of freedom with WLSMV. Only the pvalue should be interpreted. If you want to do difference testing, you need to use the DIFFTEST option. 


Sorry  I should have mentioned that I did use DIFFTEST and the result was significant (value=108, df=43, p=0.000). Since the chisquare value was reduced from the less restrictive model (loadings and thresholds free to vary) to the more restrictive model (fixed loadings and thresholds), I assumed the DIFFTEST result indicated a significant IMPROVEMENT in fit when the pathways were fixed. However, this is counter to what I expected, and what is usually seen: fixing (or equating) pathways typically results in a decrement in model fit. So, are you indicating that despite the apparent DROP in chisquare (of about 90), the DIFFTEST value is indicating an INCREASE in chisquare (of 108) and the metric invariant model represents a significant decrement in model fit as we would expect? Or, is the DIFFTEST indicating the metric invariant model is a sig improvement (since visual inspection reveals chisquare dropping from the configural to metric invariant model, and the value of this drop is similar in magnitude to the DIFFTEST value)? Your help in interpreting these unexpected results is much appreciated! Thank you! 


With WLSMV you should not be making any interpretation based on the chisquare values and degrees of freedom including those given in the DIFFTEST results. It is only the pvalues you should be looking at. If the pvalue for the DIFFTEST results is less than .05, it means that the more restrictive model significantly worsens the fit of the model. If the pvalue is greater than .05, it means that the more restrictive model does not significantly worsen the fit of the model. 


Great  thank you again for your speedy reply. I had a question with regards to reporting results from the DIFFTEST. In a manuscript would DIFFTEST results more accurately be expressed as delta chisquare or simply chisquare? Thank you again for all of your help, Jason 


I would report only the pvalue for the reasons mentioned above. 


I have a different wrinkle on testing temporal invariance that I was hoping for advice on. We have two waves of panel data, where the same participants were surveyed at each wave but different indicators are available at Time 1 and Time 2. So, I think it might be technically possible to assess whether temporal invariance holds, using only those indicators that are repeated at Time 1 and Time 2. However, I'm not sure if this type of temporal invariance analysis tells you anything  does temporal invariance using a subset of indicators provide some evidence of temporal stability over time, but less evidence than the same set of indicators from the same participants at Time 1 and Time 2. Or, is it simply not appropriate to assess temporal invariance for only a subset of indicators? Thanks! Matt 


This type of invariance is used a lot by the testing companies like ETS where the majority of items are repeated over time. I think the believability is related to how many items are repeated. 


Thanks, Linda. I will look at what ETS and others have done for examples. I'm not sure on this point  will we need to restructure the dataset for analyses of temporal invariance? We already have the data in the more common wide format, but are unsure if they need to be restructured to long format (as in LGM). Thanks! Matt 


I would leave the data in the wide format. See the Topic 4 short course handout on the website. Multiple indicator growth is described starting on Slide 77. The first part of the example illustrates testing measurement invariance across time. 

Back to top 