Message/Author 


If I have a design in which I administered the same measure to men and women at two points in time to test for sex differences in changes in the construct tapped by the measure, it seems important to demonstrate that the measure is both invariant across groups and invariant over time before one can meaningfully compare group differences over time. Do you think it appropriate to test for both forms simultaneously in the same model? Or if say invariance across groups holds but invariance across time does not might the misfit due to variance across time be partially offset by the fact that the portion of the model having to do with the crossgroups invariance fits well? In other words, would a stronger approach be to test for invariance across groups at each time point separately and then test for invariance over time within each group separately? Thanks for any insight you can share! 


I would do them separately because if you do them together it would be difficult to know where the problem is if measurement invariance does not hold. 


Thanks for the speedy reply Linda! A followup question is if invariance does hold in the combined model is it reasonable to conclude that both forms of invariance do hold? or might the overall model fit still be acceptable if one form didn't hold but the other form did. Also, from reading other postings on the discussion board, I now understand that the Mplus approach to factor analysis with categorical items is equivalent to a 2 parameter IRT model. That is very exciting I think as it opens up lots of analytic possibilities that I did not think were available (i.e., multidimimensional IRT) at least not with any software that I had. I do have a couple of questions about this though. It seems to me that one of the major advantages of IRT is that it enables equating of scores across different tests by taking threshold and slope information into account from both measures to locate participants on the underlying latent trait. It seems to me that it stands to reason that if we can equate scores from subjects from the same population who have taken two alternative forms of some test that the same methods should allow us to equate scores from subjects from different population administered the same test for whom the thresholds and/or factor loadings differ as long as we are confident that the test is still measuring the same construct in the two populations, no? If so, then I would think the same would be true for equating scores from repeated administration of the same test to the same sample even if the thresholds and/or factor loadings differ over time? If these intuitions are correct, perhaps then it is not critical that the measure demonstrate invariance as long as the data is analyzed appropriately (i.e., by incorporating both thresholds and factor loadings in our measurement model)? My second question about factor analysis of categorical items in Mplus is that I don't understand the concept of a single threshold for a measure with a hierarchical structure in which each item load on a general factor and a group factor. As these factors are orthogonal, each item is not measuring a single ability but rather two abilities. Thus, I am having a hard time at a conceptual level with the notion of a single threshold for such an item. At a purely mathematical level I can understand that such a threshold corresponds to something like a particular vector length of the vectors corresponding to participants' locations in the plane defined by the two abilities measured by a given item. Conceptually however I am having a hard time understanding the meaning of such a threshold as a subject could exceed it in many different ways (e.g., by being high on the one ability but low on the other or vice versa or by having a moderate standing on both abilities). Any insight you could provide that might help one to develop some intuition for the meaning of such thresholds would be greatly appreciated (or being pointed to a reference that might help along these lines). Thanks very much! 


I don't think that it is necessarily true that if invariance holds in the combined model that it holds for groups and time. I would test both. Regarding IRT, there is a new section on IRT in Mplus. You can find the link on the homepage. You need measurement invariance across time for it to make sense to study the development of the construct across time. The structural parameters, the means, variances and covariances of the constructs, may vary across time but the measurement parameters should not. It may be that you can have partial invariance. Regarding a single threshold when there is a general and a group factor influencing the same factor indicator, you may think of this as a threshold on a specific ability variable needed to solve the item correctly. The specific ability variable is the sum of the general and general and group factor. A person may exceed the threshold of this specific ability variable by different combinations of general and group factor values added together. 


Thanks very much LInda  your responses are very helpful! Re: the issue of the single threshold, at a conceptual level what you say makes perfect sense if I am thinking of the structure in higherorder terms with each item having a loading on one and only one factor  its firstorder factor (which then loads on a secondorder factor and so on until the highest level in the structure is reached). If I am thinking of the structure in terms of a hierarchical model such as the bifactor model, it seems to me that there are two ability variables needed to solve the item correctly  the general factor plus the group factor which is orthogonal to the general factor. Given the orthogonality it just seems conceptually messy to me to talk of a specific ability variable which is the sum of the general and group factors  it seems more accurate in this case to talk of the abilities (plural) needed to solve the item correctly. Now I realize that mathematically many hierarchical models and higherorder models are just linear transformations of each other (using the SchmidLeiman transformation and its inverse) but at a conceptual level if one thinks the hierarchical model provides the representation closer to reality in a given domain it seems strange to me to talk of a single threshold on the several independent abilities needed to solve the item. 


If the model is (1) y = g + s where y is either the logit or probit for the binary item, then it says that y needs to be large enough to solve the item, implying that a relatively low g (or s) value for a person can be compensated by his higher s (or g) value to give the same y. This says that one threshold for y is sufficient. If on the other hand solving the item requires that g exceeds a threshold and that s exceeds another threshold, then a different model than that of (1) is called for. I think the former model is that of "bifactor" modeling that has been written about recently in Psychometrika by Gibbons and others. I don't have a reference for the latter model. 


that helps  thanks Bengt! I agree a compensatory model suggests that one threshold is sufficient and that seems a more accurate albeit more unwieldy description of the conceptual meaning of the threshold in the case of a bifactor (or other hierarchical) model with this compensatory relationship. 

Jon Elhai posted on Tuesday, October 27, 2009  6:56 pm



I'm using the MLM estimator, and I conducted a difference test between two nested models (as per your "chisquare difference test" formula section of your website)... So now I have a corrected chisquare difference value and difference in degrees of freedom to look up in a chisquare table. Does it make sense to take that chisquare value difference, degrees of freedom difference, and sample size to manually calculate RMSEA? Would such a resulting RMSEA represent a difference in fit between the two models? 


I have not seen RMSEA used for difference testing. 


I am investigating measurement invariance in a study with two time points (i.e., baseline and 1 year). SF36 is used as the measurement model. Is it possible to observe both strong noninvariance (indicator intercepts) and strict noninvariance (error variances) in the same scale (at 1 year)? Thank you for your time! 


I'm not sure that I understand your question. You can test for intercept, factor loading, and residual variance invariance across the two time points. 


I ran the analysis and found that both strong noninvariance (indicator intercepts) and strict noninvariance (error variances) were identified for the same scale at one year. Thus, I meant to confirm that is it possible that both strong noninvariance (indicator intercepts) and strict noninvariance (error variances) can be identified in the same scale (at 1 year)? Thank you! 


For continuous items, which I assume you have given that you refer to intercepts, yes. 


Dear Dr. Muthen, I'm trying to test a factor for measurement invariance across groups using multigroup analysis. Factor loadings are equivalent, but when I try to constraint the intercepts, the fit is significantly worse. The test for intercept invariance is a way to assess if the value of the indicator, when the latent construct is 0, is the same in each group. Wouldn't it be possible to recenter the intercepts in each group according to the value of the intercept in the first group by subtracting the difference in each observation? Would it make any sense? 


You need intercept invariance  at least partial invariance (for some items)  in order to study factor mean differences across groups. With invariant loadings and intercepts, the indicator means change over the groups as a function of the factor means only. You don't want to transform the data. Instead you should find the indicators that are not invariant  e.g. by looking at the MIs. 

finnigan posted on Wednesday, April 04, 2012  9:10 am



Linda/Bengt I am looking at options to manage nonnormality. I am not in favour of transformations eg log transformations,but need to rule them out. I would like to ask if transformations impact tests of measurment invariance. Thanks. 


Transformations that change the relationships among the variables can affect measurement invariance. 


If your nonnormality does not consist of floor or ceiling effects, the MLR estimator may help. 


Hello, I have a second order model with 9 groups. F1 BY x1 x2 x3; F2 BY x4 x5 x6; F3 BY x6 x7 x8; F4 BY F1 F2 F3; I would like to test for configural invariance (fully nonconstrained model) I dont know how to input this in mplus for all my groups. Could you please show me or provide me with an input example for configural invariance with 3 or more groups. Would this be the way to do it with four groups (G1 G2 G3 G4) (I am really not sure): MODEL: F1 BY x1 x2 x3; F2 BY x4 x5 x6; F3 BY x6 x7 x8; F4 BY F1 F2 F3; [F1@0 F2@0 F3@0 F4@0]; MODEL G2: F1 BY x2 x3; F2 BY x5 x6; F3 BY x7 x8; F4 BY F2 F3; [x1x3]; [F1F3]; MODEL G3: F1 BY x2 x3; F2 BY x5 x6; F3 BY x7 x8; F4 BY F2 F3; [x1x3]; [F1F3]; MODEL G4: F1 BY x2 x3; F2 BY x5 x6; F3 BY x7 x8; F4 BY F2 F3; [x1x3]; [F1F3]; 


You should free the intercepts of all x's in all groups and not free the factor means in any group. Otherwise it looks ok. 


Hi, If I were to test measurement invariance over time in a group of 6 ordinal items (each with 3 response options) loading on 2 factors as described below, would the model statements for the necessary CFA models be as follows? Items measured at time 0, loading on factor 1: u10 u20 u30 Items measured at time 0, loading on factor 2: u40 u50 u60 Items measured at time 1, loading on factor 1: u11 u21 u31 Items measured at time 1, loading on factor 2: u41 u51 u61 No parameter invariance Model: f10 BY u10u30; f20 BY u40u60; f11 BY u11u31; f21 BY u41u61; Invariant factor loadings Model: f10 BY u10 u20u30 (12); f20 BY u40 u50u60 (34); f11 BY u11 u21u31 (12); f21 BY u41 u51u61 (34); Invariant factor loadings and thresholds Model: f10 BY u10 u20u30 (12); f20 BY u40 u50u60 (34); f11 BY u11 u21u31 (12); f21 BY u41 u51u61 (34); [u10$1 u11$1] (6); [u20$1 u21$1] (7); [u30$1 u31$1] (8); [u40$1 u41$1] (9); [u50$1 u51$1] (10); [u60$1 u61$1] (11); [f10@0 f20 f11 f21]; 


This is correct if you are using maximum likelihood estimation. 


Ok, thanks so much for your help! What would I need to change if I use the WLSMV estimator? Also, do the fit statistics for these models need to be within the accepted cutoffs for good model fit? 


Then it would be done in two rather than three steps. The second step is not identified with WLSMV and binary indicators. For the Delta parametrization you need to include scale factors. For the Theta parametrization, you need to include residual variances. These models are described in the Version 7.1 Mplus Language Addendum on the website with the user's guide. They are described for multiple group analysis but the same models apply across time. Having good model fit is important along with having measurement invariance. 


Hi, I am using the WLSMV estimator with the Delta parameterization to fit the CFA models to test for measurement invariance over time in a group of ordered categorical variables loading on 2 factors. In reviewing a series of worked examples on testing for measurement invariance, I have seen conflicting information regarding how to fix factor means and scale factors with the WLSMV estimator: In the model with no parameter invariance, should the factor means be fixed at 0? In the model with factor loading and threshold invariance, should the means of the factors at the first time point be fixed at 0 while those at the other time points are estimated freely? In the model with no parameter invariance, should the scale factors be fixed at 1 at all time points? In the model with factor loading and threshold invariance, should the scale factors at the first time point be fixed at 1 while those at the other time points are estimated freely? Finally, in running some of these models, I got the following error message: WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE. After checking the TECH4 output, I found that there is a correlation of 1.004 between Factor 1 measured at time 0 and time 1. Do you have any recommendations as to how I can remedy this problem? 


Please see page 9 of the Version 7.1 Mplus Language Addendum. This describes the models to use for WLSMV, Delta, and ordered categorical variables. A correlation greater than one means the model is inadmissible. You must change the model. The two factors are not statistically distinguishable so cannot be used in the same analysis. 


Hi, A couple of final questions about measurement invariance: I am working with a group of 7 items meant to measure pharmacy staff attitudes. The same pharmacy staff members were asked these items at 3 time points. I began creating a measure from these items by conducting EFA on the data collected at each time point separately to get an idea of the number of factors underlying the items. After using EFA to decide how many factors to extract, is it necessary to run a CFA model at each time point separately to "confirm" model fit before moving on to testing for measurement invariance over time? Finally, when I combined the data from all time points and ran CFA models to test for measurement invariance over time, the model estimation terminated normally, but I got the following warning message: WARNING: THE BIVARIATE TABLE OF HIVTRA12 AND ESAPSU_6 HAS AN EMPTY CELL. (HIVTRA12 and ESAPSU_6 are items from different time points.) According to the Mplus discussion board, this message indicates that I should not use both HIVTRA12 and ESAPSU_6 in the CFA models. However, I would like to include both of these items in the measure I am creating, and I am unsure as to how I can test for measurement invariance without including them both in the models. How should I handle this warning message? 


You should check the CFA at each time point to be sure the model fits well at each time point. Going from an EFA to a CFA may have a different impact at each time point. Try ML. ML does not use correlations as sample statistics for model estimation. The default for ML is logistics regression. If you want probit as in WLSMV, ask for LINK=PROBIT in the ANALYSIS command. 


I noticed that the DIFFTEST option is not available when using the ML estimator. To test for measurement invariance with the ML estimator, should I use the chisquare difference test for MLM and MLR described on the MPlus website? Is it possible to get the RMSEA, CFI, TLI, and WRMR fit statistics from the ML estimator? In a separate analysis I am working on, I compared configural, metric, and scalar models to test for measurement invariance using the WLSMV estimator, and I found that some of the fit statistics (TLI and RMSEA) improved very slightly as equality constraints were added. Can that happen? I would expect the fit statistics to get worse instead of better. 


With ML, you simply take the difference between the chisquare values and the degrees of freedom. You will get these fit statistics with ML and continuous outcomes. With categorical outcomes, means, variances, and covariances are not sufficient statistics for model estimation. These fit statistics are not available. With WLSMV, no absolute comparisons of fit statistics should be done. Only chisquare should be compared and it should be compared only using DIFFTEST. 


I used the ML estimator to test for longitudinal measurement invariance in a group of binary items using the parameter specifications given in the Mplus Language Addendum. I noticed that the factor means are fixed at zero in the configural and metric models, whereas in the scalar model, factor means are fixed at zero in one group and free in the other groups. If this is the case, is the scalar model really nested within the configural and metric models? When testing for measurement invariance with the ML estimator, you mentioned that I should perform the chisquare difference test by taking the difference between the chisquare values and the degrees of freedom. Should I use the Likelihood Ratio chisquare or the Pearson chisquare for this purpose? When testing for measurement invariance, is it customary to compare the scalar model to the configural model or to the metric model? I have seen it done both ways, and I wanted to get your thoughts. Thanks! 


Yes, these models are nested. The two chisquare values you are looking at should not be used for difference testing. These are test of the observed versus the model estimated multiway frequency tables for the categorical variables in the model. With ML and categorical variables, difference testing should be done using the loglikelihood values. See Page 487 of the current user's guide. One can do it either way. Each way answers a different question. 

Back to top 