If I have a design in which I administered the same measure to men and women at two points in time to test for sex differences in changes in the construct tapped by the measure, it seems important to demonstrate that the measure is both invariant across groups and invariant over time before one can meaningfully compare group differences over time. Do you think it appropriate to test for both forms simultaneously in the same model? Or if say invariance across groups holds but invariance across time does not might the misfit due to variance across time be partially offset by the fact that the portion of the model having to do with the cross-groups invariance fits well? In other words, would a stronger approach be to test for invariance across groups at each time point separately and then test for invariance over time within each group separately? Thanks for any insight you can share!
Thanks for the speedy reply Linda! A follow-up question is if invariance does hold in the combined model is it reasonable to conclude that both forms of invariance do hold? or might the overall model fit still be acceptable if one form didn't hold but the other form did.
Also, from reading other postings on the discussion board, I now understand that the Mplus approach to factor analysis with categorical items is equivalent to a 2 parameter IRT model. That is very exciting I think as it opens up lots of analytic possibilities that I did not think were available (i.e., multidimimensional IRT) at least not with any software that I had. I do have a couple of questions about this though. It seems to me that one of the major advantages of IRT is that it enables equating of scores across different tests by taking threshold and slope information into account from both measures to locate participants on the underlying latent trait. It seems to me that it stands to reason that if we can equate scores from subjects from the same population who have taken two alternative forms of some test that the same methods should allow us to equate scores from subjects from different population administered the same test for whom the thresholds and/or factor loadings differ as long as we are confident that the test is still measuring the same construct in the two populations, no? If so, then I would think the same would be true for equating scores from repeated administration of the same test to the same sample even if the thresholds and/or factor loadings differ over time? If these intuitions are correct, perhaps then it is not critical that the measure demonstrate invariance as long as the data is analyzed appropriately (i.e., by incorporating both thresholds and factor loadings in our measurement model)?
My second question about factor analysis of categorical items in Mplus is that I don't understand the concept of a single threshold for a measure with a hierarchical structure in which each item load on a general factor and a group factor. As these factors are orthogonal, each item is not measuring a single ability but rather two abilities. Thus, I am having a hard time at a conceptual level with the notion of a single threshold for such an item. At a purely mathematical level I can understand that such a threshold corresponds to something like a particular vector length of the vectors corresponding to participants' locations in the plane defined by the two abilities measured by a given item. Conceptually however I am having a hard time understanding the meaning of such a threshold as a subject could exceed it in many different ways (e.g., by being high on the one ability but low on the other or vice versa or by having a moderate standing on both abilities). Any insight you could provide that might help one to develop some intuition for the meaning of such thresholds would be greatly appreciated (or being pointed to a reference that might help along these lines). Thanks very much!
I don't think that it is necessarily true that if invariance holds in the combined model that it holds for groups and time. I would test both.
Regarding IRT, there is a new section on IRT in Mplus. You can find the link on the homepage.
You need measurement invariance across time for it to make sense to study the development of the construct across time. The structural parameters, the means, variances and covariances of the constructs, may vary across time but the measurement parameters should not. It may be that you can have partial invariance.
Regarding a single threshold when there is a general and a group factor influencing the same factor indicator, you may think of this as a threshold on a specific ability variable needed to solve the item correctly. The specific ability variable is the sum of the general and general and group factor. A person may exceed the threshold of this specific ability variable by different combinations of general and group factor values added together.
Thanks very much LInda - your responses are very helpful! Re: the issue of the single threshold, at a conceptual level what you say makes perfect sense if I am thinking of the structure in higher-order terms with each item having a loading on one and only one factor - its first-order factor (which then loads on a second-order factor and so on until the highest level in the structure is reached). If I am thinking of the structure in terms of a hierarchical model such as the bi-factor model, it seems to me that there are two ability variables needed to solve the item correctly - the general factor plus the group factor which is orthogonal to the general factor. Given the orthogonality it just seems conceptually messy to me to talk of a specific ability variable which is the sum of the general and group factors - it seems more accurate in this case to talk of the abilities (plural) needed to solve the item correctly. Now I realize that mathematically many hierarchical models and higher-order models are just linear transformations of each other (using the Schmid-Leiman transformation and its inverse) but at a conceptual level if one thinks the hierarchical model provides the representation closer to reality in a given domain it seems strange to me to talk of a single threshold on the several independent abilities needed to solve the item.
where y is either the logit or probit for the binary item, then it says that y needs to be large enough to solve the item, implying that a relatively low g (or s) value for a person can be compensated by his higher s (or g) value to give the same y. This says that one threshold for y is sufficient. If on the other hand solving the item requires that g exceeds a threshold and that s exceeds another threshold, then a different model than that of (1) is called for. I think the former model is that of "bi-factor" modeling that has been written about recently in Psychometrika by Gibbons and others. I don't have a reference for the latter model.
that helps - thanks Bengt! I agree a compensatory model suggests that one threshold is sufficient and that seems a more accurate albeit more unwieldy description of the conceptual meaning of the threshold in the case of a bi-factor (or other hierarchical) model with this compensatory relationship.
Jon Elhai posted on Tuesday, October 27, 2009 - 6:56 pm
I'm using the MLM estimator, and I conducted a difference test between two nested models (as per your "chisquare difference test" formula section of your website)...
So now I have a corrected chisquare difference value and difference in degrees of freedom to look up in a chi-square table.
Does it make sense to take that chi-square value difference, degrees of freedom difference, and sample size to manually calculate RMSEA? Would such a resulting RMSEA represent a difference in fit between the two models?
I am investigating measurement invariance in a study with two time points (i.e., baseline and 1 year). SF-36 is used as the measurement model. Is it possible to observe both strong non-invariance (indicator intercepts) and strict non-invariance (error variances) in the same scale (at 1 year)? Thank you for your time!
I ran the analysis and found that both strong non-invariance (indicator intercepts) and strict non-invariance (error variances) were identified for the same scale at one year. Thus, I meant to confirm that is it possible that both strong non-invariance (indicator intercepts) and strict non-invariance (error variances) can be identified in the same scale (at 1 year)?
I'm trying to test a factor for measurement invariance across groups using multigroup analysis. Factor loadings are equivalent, but when I try to constraint the intercepts, the fit is significantly worse. The test for intercept invariance is a way to assess if the value of the indicator, when the latent construct is 0, is the same in each group. Wouldn't it be possible to recenter the intercepts in each group according to the value of the intercept in the first group by subtracting the difference in each observation? Would it make any sense?
You need intercept invariance - at least partial invariance (for some items) - in order to study factor mean differences across groups. With invariant loadings and intercepts, the indicator means change over the groups as a function of the factor means only. You don't want to transform the data. Instead you should find the indicators that are not invariant - e.g. by looking at the MIs.
finnigan posted on Wednesday, April 04, 2012 - 9:10 am
I am looking at options to manage nonnormality. I am not in favour of transformations eg log transformations,but need to rule them out. I would like to ask if transformations impact tests of measurment invariance. Thanks.
Hi, If I were to test measurement invariance over time in a group of 6 ordinal items (each with 3 response options) loading on 2 factors as described below, would the model statements for the necessary CFA models be as follows?
Items measured at time 0, loading on factor 1: u10 u20 u30 Items measured at time 0, loading on factor 2: u40 u50 u60 Items measured at time 1, loading on factor 1: u11 u21 u31 Items measured at time 1, loading on factor 2: u41 u51 u61
No parameter invariance Model: f10 BY u10-u30; f20 BY u40-u60; f11 BY u11-u31; f21 BY u41-u61;
Invariant factor loadings Model: f10 BY u10 u20-u30 (1-2); f20 BY u40 u50-u60 (3-4); f11 BY u11 u21-u31 (1-2); f21 BY u41 u51-u61 (3-4);
Invariant factor loadings and thresholds Model: f10 BY u10 u20-u30 (1-2); f20 BY u40 u50-u60 (3-4); f11 BY u11 u21-u31 (1-2); f21 BY u41 u51-u61 (3-4);
Then it would be done in two rather than three steps. The second step is not identified with WLSMV and binary indicators. For the Delta parametrization you need to include scale factors. For the Theta parametrization, you need to include residual variances. These models are described in the Version 7.1 Mplus Language Addendum on the website with the user's guide. They are described for multiple group analysis but the same models apply across time. Having good model fit is important along with having measurement invariance.
Hi, I am using the WLSMV estimator with the Delta parameterization to fit the CFA models to test for measurement invariance over time in a group of ordered categorical variables loading on 2 factors. In reviewing a series of worked examples on testing for measurement invariance, I have seen conflicting information regarding how to fix factor means and scale factors with the WLSMV estimator:
In the model with no parameter invariance, should the factor means be fixed at 0? In the model with factor loading and threshold invariance, should the means of the factors at the first time point be fixed at 0 while those at the other time points are estimated freely?
In the model with no parameter invariance, should the scale factors be fixed at 1 at all time points? In the model with factor loading and threshold invariance, should the scale factors at the first time point be fixed at 1 while those at the other time points are estimated freely?
Finally, in running some of these models, I got the following error message: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE.
After checking the TECH4 output, I found that there is a correlation of 1.004 between Factor 1 measured at time 0 and time 1. Do you have any recommendations as to how I can remedy this problem?
Hi, A couple of final questions about measurement invariance: I am working with a group of 7 items meant to measure pharmacy staff attitudes. The same pharmacy staff members were asked these items at 3 time points. I began creating a measure from these items by conducting EFA on the data collected at each time point separately to get an idea of the number of factors underlying the items.
After using EFA to decide how many factors to extract, is it necessary to run a CFA model at each time point separately to "confirm" model fit before moving on to testing for measurement invariance over time?
Finally, when I combined the data from all time points and ran CFA models to test for measurement invariance over time, the model estimation terminated normally, but I got the following warning message: WARNING: THE BIVARIATE TABLE OF HIVTRA12 AND ESAPSU_6 HAS AN EMPTY CELL. (HIVTRA12 and ESAPSU_6 are items from different time points.) According to the Mplus discussion board, this message indicates that I should not use both HIVTRA12 and ESAPSU_6 in the CFA models. However, I would like to include both of these items in the measure I am creating, and I am unsure as to how I can test for measurement invariance without including them both in the models. How should I handle this warning message?
I noticed that the DIFFTEST option is not available when using the ML estimator. To test for measurement invariance with the ML estimator, should I use the chi-square difference test for MLM and MLR described on the MPlus website?
Is it possible to get the RMSEA, CFI, TLI, and WRMR fit statistics from the ML estimator?
In a separate analysis I am working on, I compared configural, metric, and scalar models to test for measurement invariance using the WLSMV estimator, and I found that some of the fit statistics (TLI and RMSEA) improved very slightly as equality constraints were added. Can that happen? I would expect the fit statistics to get worse instead of better.
With ML, you simply take the difference between the chi-square values and the degrees of freedom.
You will get these fit statistics with ML and continuous outcomes. With categorical outcomes, means, variances, and covariances are not sufficient statistics for model estimation. These fit statistics are not available.
With WLSMV, no absolute comparisons of fit statistics should be done. Only chi-square should be compared and it should be compared only using DIFFTEST.
I used the ML estimator to test for longitudinal measurement invariance in a group of binary items using the parameter specifications given in the Mplus Language Addendum. I noticed that the factor means are fixed at zero in the configural and metric models, whereas in the scalar model, factor means are fixed at zero in one group and free in the other groups. If this is the case, is the scalar model really nested within the configural and metric models?
When testing for measurement invariance with the ML estimator, you mentioned that I should perform the chi-square difference test by taking the difference between the chi-square values and the degrees of freedom. Should I use the Likelihood Ratio chi-square or the Pearson chi-square for this purpose?
When testing for measurement invariance, is it customary to compare the scalar model to the configural model or to the metric model? I have seen it done both ways, and I wanted to get your thoughts.
The two chi-square values you are looking at should not be used for difference testing. These are test of the observed versus the model estimated multi-way frequency tables for the categorical variables in the model. With ML and categorical variables, difference testing should be done using the loglikelihood values. See Page 487 of the current user's guide.
One can do it either way. Each way answers a different question.
Margarita posted on Thursday, September 08, 2016 - 8:26 am
Dear Dr. Muthen,
After getting a good fit for a 3x3 cross-lag model (1 observed, 2 latent variables) I wanted to check for measurement and structural invariance across gender. (Note: the two latents represent two domains from a bigger scale)
So, first I conducted 2-factor CFAs for the 3 time points, and I got good fit in all three time points.
When I combined the 2 latent factors from all three points (2x3) in a CFA I also had a very good fit. Then, however, when checking for measurement invariance within gender I got this warning:
GROUP 2: WARNING: THE SAMPLE CORRELATION OF ITEM22_T2 AND ITEM24_T3 IS -0.986 DUE TO ONE OR MORE ZERO CELLS IN THEIR BIVARIATE TABLE. INFORMATION FROM THESE VARIABLES CAN BE USED TO CREATE ONE NEW VARIABLE (note. they are from time point 2 and 3)
1)Previous discussions indicate that such variables should not be in the same model. What about the fact that they are from two different time points?
2)I tried using ML with probit, like you suggested above, but got this "THERE IS NOT ENOUGH MEMORY SPACE TO RUN Mplus ON THE CURRENT INPUT FILE.."
3)Can the sample correlation warning be ignored? Given that the items are from different time-points this does not imply that there is something structurally wrong with the 2 domains, right? In that case, could I proceed with the analyses?
It sounds like you have ordinal outcomes and use WLSMV. If it just one correlation that is of questionable quality, perhaps you can ignore this problem.
Margarita posted on Friday, September 09, 2016 - 5:09 am
Thank you for your prompt reply.
It is actually 2 relationships, and MI suggest freeing them. However, when I do free them, I get WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE ITEM22_T2
Would fixing the residual correlation to a value below its current one be a good idea? When doing so the fit for that groups seems to increase but the warning remains.
I am just trying to understand the different warnings a bit better. Thank you very much.
Q1. No. Because the solution is inadmissible you should either modify the model or delete the item.
Kathy Xiao posted on Friday, December 23, 2016 - 6:18 am
Dear Dr. Muthen,
I have a structural model consisted of four latent variables (L1 L2 L3 L4), and I tested it for my total sample, it resulted in good fit. But now I want to test whether there is multiple group difference in race (white, black, others)
So I started with testing whether the latent variables themselves have measurement invariance by the 3 racial groups.
But shall I test the latent variables separately? (i.e. doing step-by-step MI for L1, L2, L3, L4 seperately)
OR I can test them simultaneously? (i.e testing all of them in the same mplus tests)?
You can do either, but the ultimate test is to do it simultaneously which may also be a more powerful test.
Kathy Xiao posted on Friday, December 23, 2016 - 8:05 am
I tried it either way, but the sample size became different because the missing values of the variables for each latent variable are different. Also, the factor loading differ between these two approaches.
I want to test measurement invariance for a personality measure (i.e. conscientiousness) over time and across two group (with/without alcohol-related problems).
I have 2 latent factors, time 1 and time 2, with 5 indicators each (ordinal items). The latent factors correlate over time, as well as the residual of each items (i.e. item1 at time1 with item 1 at time2).
Typically works in the personality literature test invariance over time within groups and then add equality constrains for groups (e.g., age groups). For example, they report fit indices for... 1. Configural over time 2. Metric over time 3. Scalar over time 4. Between age-groups metric 5. Between age-groups scalar
However, I wonder what they mean with "within groups". Do they test invarince over time separately for each group? For example using SUBPOPULATION. However, they do not report fit indices for the group separately.
Then, I will use "MODEL GR1" and "MODEL GR2" to assign different "labels" to free all parameters (e.g. factors loading, intercepts, residuals) within each group. To test weak, strong and strict invariance I will proceed to impose equality constraints (i.e. same label) over time and across groups.
I ask to be sure of what I am doing, thanks again.
Kathy Xiao posted on Monday, April 09, 2018 - 2:27 pm
Thanks very much for your reply, Dr. Muthen! I have another question: I am doing a MI for three racial groups, when I tested the model fit for the baseline models. I found good fit for one group, but not for the other two groups, which need to adjust for 2 correlated errors to meet the criteria. Is it okay if I proceed to the measurement invariance testing with different baseline models? E.g. no correlated errors for one group (df=54), 2 correlated errors for the other two groups (df=52)? Thanks very much!
Kathy Xiao posted on Monday, April 09, 2018 - 6:01 pm
Thank you!! My additional question is, since there is a need to adjust correlated errors for the other two groups, does it suggest maybe a 2-factor model is better than a 1-factor model (current)? Shall I switch to 2-factor model for all three groups? In a previous EFA, I found eigenvalue only larger than 1 with 1-factor, but the chi-square is significant. And chi2 of 2-factor model is lower than 1-factor. Iím not sure if itís okay to still use 1-factor model, but adjust for the 2 correlated errors. May I know your suggestions? Thanks again!