Measurement invariance across groups ...
Message/Author
 Richard E. Zinbarg posted on Sunday, July 30, 2006 - 8:04 pm
If I have a design in which I administered the same measure to men and women at two points in time to test for sex differences in changes in the construct tapped by the measure, it seems important to demonstrate that the measure is both invariant across groups and invariant over time before one can meaningfully compare group differences over time. Do you think it appropriate to test for both forms simultaneously in the same model? Or if say invariance across groups holds but invariance across time does not might the misfit due to variance across time be partially offset by the fact that the portion of the model having to do with the cross-groups invariance fits well? In other words, would a stronger approach be to test for invariance across groups at each time point separately and then test for invariance over time within each group separately? Thanks for any insight you can share!
 Linda K. Muthen posted on Monday, July 31, 2006 - 7:49 am
I would do them separately because if you do them together it would be difficult to know where the problem is if measurement invariance does not hold.
 Richard E. Zinbarg posted on Monday, July 31, 2006 - 10:19 am
Thanks for the speedy reply Linda! A follow-up question is if invariance does hold in the combined model is it reasonable to conclude that both forms of invariance do hold? or might the overall model fit still be acceptable if one form didn't hold but the other form did.

My second question about factor analysis of categorical items in Mplus is that I don't understand the concept of a single threshold for a measure with a hierarchical structure in which each item load on a general factor and a group factor. As these factors are orthogonal, each item is not measuring a single ability but rather two abilities. Thus, I am having a hard time at a conceptual level with the notion of a single threshold for such an item. At a purely mathematical level I can understand that such a threshold corresponds to something like a particular vector length of the vectors corresponding to participants' locations in the plane defined by the two abilities measured by a given item. Conceptually however I am having a hard time understanding the meaning of such a threshold as a subject could exceed it in many different ways (e.g., by being high on the one ability but low on the other or vice versa or by having a moderate standing on both abilities). Any insight you could provide that might help one to develop some intuition for the meaning of such thresholds would be greatly appreciated (or being pointed to a reference that might help along these lines). Thanks very much!
 Linda K. Muthen posted on Monday, July 31, 2006 - 6:58 pm
I don't think that it is necessarily true that if invariance holds in the combined model that it holds for groups and time. I would test both.

Regarding IRT, there is a new section on IRT in Mplus. You can find the link on the homepage.

You need measurement invariance across time for it to make sense to study the development of the construct across time. The structural parameters, the means, variances and covariances of the constructs, may vary across time but the measurement parameters should not. It may be that you can have partial invariance.

Regarding a single threshold when there is a general and a group factor influencing the same factor indicator, you may think of this as a threshold on a specific ability variable needed to solve the item correctly. The specific ability variable is the sum of the general and general and group factor. A person may exceed the threshold of this specific ability variable by different combinations of general and group factor values added together.
 Richard E. Zinbarg posted on Tuesday, August 01, 2006 - 8:03 am
Thanks very much LInda - your responses are very helpful! Re: the issue of the single threshold, at a conceptual level what you say makes perfect sense if I am thinking of the structure in higher-order terms with each item having a loading on one and only one factor - its first-order factor (which then loads on a second-order factor and so on until the highest level in the structure is reached). If I am thinking of the structure in terms of a hierarchical model such as the bi-factor model, it seems to me that there are two ability variables needed to solve the item correctly - the general factor plus the group factor which is orthogonal to the general factor. Given the orthogonality it just seems conceptually messy to me to talk of a specific ability variable which is the sum of the general and group factors - it seems more accurate in this case to talk of the abilities (plural) needed to solve the item correctly. Now I realize that mathematically many hierarchical models and higher-order models are just linear transformations of each other (using the Schmid-Leiman transformation and its inverse) but at a conceptual level if one thinks the hierarchical model provides the representation closer to reality in a given domain it seems strange to me to talk of a single threshold on the several independent abilities needed to solve the item.
 Bengt O. Muthen posted on Tuesday, August 01, 2006 - 12:19 pm
If the model is

(1) y = g + s

where y is either the logit or probit for the binary item, then it says that y needs to be large enough to solve the item, implying that a relatively low g (or s) value for a person can be compensated by his higher s (or g) value to give the same y. This says that one threshold for y is sufficient. If on the other hand solving the item requires that g exceeds a threshold and that s exceeds another threshold, then a different model than that of (1) is called for. I think the former model is that of "bi-factor" modeling that has been written about recently in Psychometrika by Gibbons and others. I don't have a reference for the latter model.
 Richard E. Zinbarg posted on Tuesday, August 01, 2006 - 12:25 pm
that helps - thanks Bengt! I agree a compensatory model suggests that one threshold is sufficient and that seems a more accurate albeit more unwieldy description of the conceptual meaning of the threshold in the case of a bi-factor (or other hierarchical) model with this compensatory relationship.
 Jon Elhai posted on Tuesday, October 27, 2009 - 6:56 pm
I'm using the MLM estimator, and I conducted a difference test between two nested models (as per your "chisquare difference test" formula section of your website)...

So now I have a corrected chisquare difference value and difference in degrees of freedom to look up in a chi-square table.

Does it make sense to take that chi-square value difference, degrees of freedom difference, and sample size to manually calculate RMSEA? Would such a resulting RMSEA represent a difference in fit between the two models?
 Linda K. Muthen posted on Wednesday, October 28, 2009 - 6:24 am
I have not seen RMSEA used for difference testing.
 Pranav Gandhi posted on Friday, June 25, 2010 - 10:25 am
I am investigating measurement invariance in a study with two time points (i.e., baseline and 1 year). SF-36 is used as the measurement model. Is it possible to observe both strong non-invariance (indicator intercepts) and strict non-invariance (error variances) in the same scale (at 1 year)?
 Linda K. Muthen posted on Friday, June 25, 2010 - 11:25 am
I'm not sure that I understand your question. You can test for intercept, factor loading, and residual variance invariance across the two time points.
 Pranav Gandhi posted on Saturday, June 26, 2010 - 12:29 pm
I ran the analysis and found that both strong non-invariance (indicator intercepts) and strict non-invariance (error variances) were identified for the same scale at one year. Thus, I meant to confirm that is it possible that both strong non-invariance (indicator intercepts) and strict non-invariance (error variances) can be identified in the same scale (at 1 year)?

Thank you!
 Linda K. Muthen posted on Saturday, June 26, 2010 - 2:13 pm
For continuous items, which I assume you have given that you refer to intercepts, yes.
 Nicolas Müller posted on Monday, March 14, 2011 - 6:39 am
Dear Dr. Muthen,

I'm trying to test a factor for measurement invariance across groups using multigroup analysis.
Factor loadings are equivalent, but when I try to constraint the intercepts, the fit is significantly worse.
The test for intercept invariance is a way to assess if the value of the indicator, when the latent construct is 0, is the same in each group.
Wouldn't it be possible to recenter the intercepts in each group according to the value of the intercept in the first group by subtracting the difference in each observation? Would it make any sense?
 Bengt O. Muthen posted on Monday, March 14, 2011 - 9:49 am
You need intercept invariance - at least partial invariance (for some items) - in order to study factor mean differences across groups. With invariant loadings and intercepts, the indicator means change over the groups as a function of the factor means only. You don't want to transform the data. Instead you should find the indicators that are not invariant - e.g. by looking at the MIs.
 finnigan posted on Wednesday, April 04, 2012 - 9:10 am
Linda/Bengt

I am looking at options to manage nonnormality. I am not in favour of transformations eg log transformations,but need to rule them out. I would like to ask if transformations impact tests of measurment invariance.
Thanks.
 Linda K. Muthen posted on Wednesday, April 04, 2012 - 1:09 pm
Transformations that change the relationships among the variables can affect measurement invariance.
 Linda K. Muthen posted on Wednesday, April 04, 2012 - 1:31 pm
If your non-normality does not consist of floor or ceiling effects, the MLR estimator may help.
 Jean-Samuel Cloutier posted on Monday, September 10, 2012 - 7:33 pm
Hello,
I have a second order model with 9 groups.
F1 BY x1 x2 x3;
F2 BY x4 x5 x6;
F3 BY x6 x7 x8;
F4 BY F1 F2 F3;

I would like to test for configural invariance (fully non-constrained model)

I dont know how to input this in mplus for all my groups.

Could you please show me or provide me with an input example for configural invariance with 3 or more groups.

Would this be the way to do it with four groups (G1 G2 G3 G4) (I am really not sure):

MODEL:
F1 BY x1 x2 x3;
F2 BY x4 x5 x6;
F3 BY x6 x7 x8;
F4 BY F1 F2 F3;
[F1@0 F2@0 F3@0 F4@0];

MODEL G2:
F1 BY x2 x3;
F2 BY x5 x6;
F3 BY x7 x8;
F4 BY F2 F3;

[x1-x3];
[F1-F3];

MODEL G3:
F1 BY x2 x3;
F2 BY x5 x6;
F3 BY x7 x8;
F4 BY F2 F3;

[x1-x3];
[F1-F3];

MODEL G4:
F1 BY x2 x3;
F2 BY x5 x6;
F3 BY x7 x8;
F4 BY F2 F3;

[x1-x3];
[F1-F3];
 Bengt O. Muthen posted on Tuesday, September 11, 2012 - 5:38 am
You should free the intercepts of all x's in all groups and not free the factor means in any group. Otherwise it looks ok.
 Jennifer DeCuir posted on Monday, September 16, 2013 - 9:19 am
Hi,
If I were to test measurement invariance over time in a group of 6 ordinal items (each with 3 response options) loading on 2 factors as described below, would the model statements for the necessary CFA models be as follows?

Items measured at time 0, loading on factor 1: u10 u20 u30
Items measured at time 0, loading on factor 2: u40 u50 u60
Items measured at time 1, loading on factor 1: u11 u21 u31
Items measured at time 1, loading on factor 2: u41 u51 u61

No parameter invariance
Model:
f10 BY u10-u30;
f20 BY u40-u60;
f11 BY u11-u31;
f21 BY u41-u61;

Model:
f10 BY u10
u20-u30 (1-2);
f20 BY u40
u50-u60 (3-4);
f11 BY u11
u21-u31 (1-2);
f21 BY u41
u51-u61 (3-4);

Model:
f10 BY u10
u20-u30 (1-2);
f20 BY u40
u50-u60 (3-4);
f11 BY u11
u21-u31 (1-2);
f21 BY u41
u51-u61 (3-4);

[u10\$1 u11\$1] (6);
[u20\$1 u21\$1] (7);
[u30\$1 u31\$1] (8);
[u40\$1 u41\$1] (9);
[u50\$1 u51\$1] (10);
[u60\$1 u61\$1] (11);

[f10@0 f20 f11 f21];
 Linda K. Muthen posted on Monday, September 16, 2013 - 3:08 pm
This is correct if you are using maximum likelihood estimation.
 Jennifer DeCuir posted on Monday, September 16, 2013 - 3:46 pm
Ok, thanks so much for your help!

What would I need to change if I use the WLSMV estimator?

Also, do the fit statistics for these models need to be within the accepted cutoffs for good model fit?
 Linda K. Muthen posted on Monday, September 16, 2013 - 4:07 pm
Then it would be done in two rather than three steps. The second step is not identified with WLSMV and binary indicators. For the Delta parametrization you need to include scale factors. For the Theta parametrization, you need to include residual variances. These models are described in the Version 7.1 Mplus Language Addendum on the website with the user's guide. They are described for multiple group analysis but the same models apply across time. Having good model fit is important along with having measurement invariance.
 Jennifer DeCuir posted on Wednesday, September 18, 2013 - 7:14 am
Hi,
I am using the WLSMV estimator with the Delta parameterization to fit the CFA models to test for measurement invariance over time in a group of ordered categorical variables loading on 2 factors. In reviewing a series of worked examples on testing for measurement invariance, I have seen conflicting information regarding how to fix factor means and scale factors with the WLSMV estimator:

In the model with no parameter invariance, should the factor means be fixed at 0?
In the model with factor loading and threshold invariance, should the means of the factors at the first time point be fixed at 0 while those at the other time points are estimated freely?

In the model with no parameter invariance, should the scale factors be fixed at 1 at all time points?
In the model with factor loading and threshold invariance, should the scale factors at the first time point be fixed at 1 while those at the other time points are estimated freely?

Finally, in running some of these models, I got the following error message:
WARNING: THE LATENT VARIABLE COVARIANCE MATRIX (PSI) IS NOT POSITIVE DEFINITE.

After checking the TECH4 output, I found that there is a correlation of 1.004 between Factor 1 measured at time 0 and time 1. Do you have any recommendations as to how I can remedy this problem?
 Linda K. Muthen posted on Wednesday, September 18, 2013 - 8:02 am
Please see page 9 of the Version 7.1 Mplus Language Addendum. This describes the models to use for WLSMV, Delta, and ordered categorical variables.

A correlation greater than one means the model is inadmissible. You must change the model. The two factors are not statistically distinguishable so cannot be used in the same analysis.
 Jennifer DeCuir posted on Thursday, September 26, 2013 - 6:50 am
Hi,
A couple of final questions about measurement invariance:
I am working with a group of 7 items meant to measure pharmacy staff attitudes. The same pharmacy staff members were asked these items at 3 time points. I began creating a measure from these items by conducting EFA on the data collected at each time point separately to get an idea of the number of factors underlying the items.

After using EFA to decide how many factors to extract, is it necessary to run a CFA model at each time point separately to "confirm" model fit before moving on to testing for measurement invariance over time?

Finally, when I combined the data from all time points and ran CFA models to test for measurement invariance over time, the model estimation terminated normally, but I got the following warning message:
WARNING: THE BIVARIATE TABLE OF HIVTRA12 AND ESAPSU_6 HAS AN EMPTY CELL.
(HIVTRA12 and ESAPSU_6 are items from different time points.) According to the Mplus discussion board, this message indicates that I should not use both HIVTRA12 and ESAPSU_6 in the CFA models. However, I would like to include both of these items in the measure I am creating, and I am unsure as to how I can test for measurement invariance without including them both in the models. How should I handle this warning message?
 Linda K. Muthen posted on Thursday, September 26, 2013 - 12:21 pm
You should check the CFA at each time point to be sure the model fits well at each time point. Going from an EFA to a CFA may have a different impact at each time point.

Try ML. ML does not use correlations as sample statistics for model estimation. The default for ML is logistics regression. If you want probit as in WLSMV, ask for LINK=PROBIT in the ANALYSIS command.
 Jennifer DeCuir posted on Thursday, October 03, 2013 - 9:33 am
I noticed that the DIFFTEST option is not available when using the ML estimator. To test for measurement invariance with the ML estimator, should I use the chi-square difference test for MLM and MLR described on the MPlus website?

Is it possible to get the RMSEA, CFI, TLI, and WRMR fit statistics from the ML estimator?

In a separate analysis I am working on, I compared configural, metric, and scalar models to test for measurement invariance using the WLSMV estimator, and I found that some of the fit statistics (TLI and RMSEA) improved very slightly as equality constraints were added. Can that happen? I would expect the fit statistics to get worse instead of better.
 Linda K. Muthen posted on Thursday, October 03, 2013 - 10:24 am
With ML, you simply take the difference between the chi-square values and the degrees of freedom.

You will get these fit statistics with ML and continuous outcomes. With categorical outcomes, means, variances, and covariances are not sufficient statistics for model estimation. These fit statistics are not available.

With WLSMV, no absolute comparisons of fit statistics should be done. Only chi-square should be compared and it should be compared only using DIFFTEST.
 Jennifer DeCuir posted on Wednesday, November 06, 2013 - 8:27 am
I used the ML estimator to test for longitudinal measurement invariance in a group of binary items using the parameter specifications given in the Mplus Language Addendum. I noticed that the factor means are fixed at zero in the configural and metric models, whereas in the scalar model, factor means are fixed at zero in one group and free in the other groups. If this is the case, is the scalar model really nested within the configural and metric models?

When testing for measurement invariance with the ML estimator, you mentioned that I should perform the chi-square difference test by taking the difference between the chi-square values and the degrees of freedom. Should I use the Likelihood Ratio chi-square or the Pearson chi-square for this purpose?

When testing for measurement invariance, is it customary to compare the scalar model to the configural model or to the metric model? I have seen it done both ways, and I wanted to get your thoughts.

Thanks!
 Linda K. Muthen posted on Wednesday, November 06, 2013 - 10:06 am
Yes, these models are nested.

The two chi-square values you are looking at should not be used for difference testing. These are test of the observed versus the model estimated multi-way frequency tables for the categorical variables in the model. With ML and categorical variables, difference testing should be done using the loglikelihood values. See Page 487 of the current user's guide.

One can do it either way. Each way answers a different question.
 Margarita  posted on Thursday, September 08, 2016 - 8:26 am
Dear Dr. Muthen,

After getting a good fit for a 3x3 cross-lag model (1 observed, 2 latent variables) I wanted to check for measurement and structural invariance across gender. (Note: the two latents represent two domains from a bigger scale)

So, first I conducted 2-factor CFAs for the 3 time points, and I got good fit in all three time points.

When I combined the 2 latent factors from all three points (2x3) in a CFA I also had a very good fit. Then, however, when checking for measurement invariance within gender I got this warning:

GROUP 2: WARNING: THE SAMPLE CORRELATION OF ITEM22_T2 AND ITEM24_T3 IS -0.986 DUE TO ONE OR MORE ZERO CELLS IN THEIR BIVARIATE TABLE. INFORMATION FROM THESE VARIABLES CAN BE USED TO CREATE ONE NEW VARIABLE (note. they are from time point 2 and 3)

1)Previous discussions indicate that such variables should not be in the same model. What about the fact that they are from two different time points?

2)I tried using ML with probit, like you suggested above, but got this "THERE IS NOT ENOUGH MEMORY SPACE TO RUN Mplus ON THE CURRENT INPUT FILE.."

3)Can the sample correlation warning be ignored? Given that the items are from different time-points this does not imply that there is something structurally wrong with the 2 domains, right? In that case, could I proceed with the analyses?

Any input would be greatly appreciated.
 Bengt O. Muthen posted on Thursday, September 08, 2016 - 1:29 pm
It sounds like you have ordinal outcomes and use WLSMV. If it just one correlation that is of questionable quality, perhaps you can ignore this problem.
 Margarita  posted on Friday, September 09, 2016 - 5:09 am
Dr. Muthen,

It is actually 2 relationships, and MI suggest freeing them. However, when I do free them, I get WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE ITEM22_T2

Would fixing the residual correlation to a value below its current one be a good idea? When doing so the fit for that groups seems to increase but the warning remains.

I am just trying to understand the different warnings a bit better. Thank you very much.
 Bengt O. Muthen posted on Friday, September 09, 2016 - 1:42 pm
Q1. No. Because the solution is inadmissible you should either modify the model or delete the item.
 Kathy Xiao posted on Friday, December 23, 2016 - 6:18 am
Dear Dr. Muthen,

I have a structural model consisted of four latent variables (L1 L2 L3 L4), and I tested it for my total sample, it resulted in good fit. But now I want to test whether there is multiple group difference in race (white, black, others)

So I started with testing whether the latent variables themselves have measurement invariance by the 3 racial groups.

But shall I test the latent variables separately? (i.e. doing step-by-step MI for L1, L2, L3, L4 seperately)

OR I can test them simultaneously? (i.e testing all of them in the same mplus tests)?

Thanks!
 Bengt O. Muthen posted on Friday, December 23, 2016 - 7:28 am
You can do either, but the ultimate test is to do it simultaneously which may also be a more powerful test.
 Kathy Xiao posted on Friday, December 23, 2016 - 8:05 am
I tried it either way, but the sample size became different because the missing values of the variables for each latent variable are different. Also, the factor loading differ between these two approaches.

Would it affect the result?

Thanks!
 Bengt O. Muthen posted on Friday, December 23, 2016 - 10:33 am
That speaks for doing it simultaneously.
 Kathy Xiao posted on Saturday, December 24, 2016 - 4:57 am

So if L1 and L2 are correlated, and I also have C1 and C2 as covariants for L1, L2.

Share I put the correlation and control in the multiple group measurement invariance?

Thanks!
 Bengt O. Muthen posted on Saturday, December 24, 2016 - 6:42 am
Yes.
 Martina Luchetti posted on Tuesday, August 22, 2017 - 10:39 am
Dear Dr. Muthen,

I want to test measurement invariance for a personality measure (i.e. conscientiousness) over time and across two group (with/without alcohol-related problems).

I have 2 latent factors, time 1 and time 2, with 5 indicators each (ordinal items). The latent factors correlate over time, as well as the residual of each items (i.e. item1 at time1 with item 1 at time2).

Typically works in the personality literature test invariance over time within groups and then add equality constrains for groups (e.g., age groups).
For example, they report fit indices for...
1. Configural over time
2. Metric over time
3. Scalar over time
4. Between age-groups metric
5. Between age-groups scalar

However, I wonder what they mean with "within groups". Do they test invarince over time separately for each group? For example using SUBPOPULATION. However, they do not report fit indices for the group separately.

How you would suggest to proceed?

Thank you!

Martina
 Bengt O. Muthen posted on Tuesday, August 22, 2017 - 6:00 pm
Within groups probably refers to a single analysis of all groups and time points where you impose measurement invariance only across time points, not across groups. That can easily be done in Mplus.
 Martina Luchetti posted on Thursday, August 24, 2017 - 3:14 pm
Thanks You!

So, to test configural invariance over time across groups, I need to specify the overall model and use the group-specific model command to free all parameters for each group, right?

For example:

GROUPING IS (1=G1 2=G2)

MODEL:
Time1 BY C1T1* C2T1 C3T1;
Time2 BY C1T2* C2T2 C3T2;

C1T1-C3T1 PWITH C1T2-C3T2;

Time2 ON Time1;

[Time1@0 Time2@0];
Time1@1; Time2@1;

!estimate intercepts
[C1T1]; [C2T1]; [C3T1]; [C1T2]; [C2T2]; [C3T2];
!estimate residuals
C1T1; C2T1; C3T1; C1T2; C2T2; C3T2;

Then, I will use "MODEL GR1" and "MODEL GR2" to assign different "labels" to free all parameters (e.g. factors loading, intercepts, residuals) within each group.
To test weak, strong and strict invariance I will proceed to impose equality constraints (i.e. same label) over time and across groups.

I ask to be sure of what I am doing, thanks again.

Martina
 Bengt O. Muthen posted on Thursday, August 24, 2017 - 6:44 pm
Correct. Don't forget - one label per line.