Measurement equivalent/invariance PreviousNext
Mplus Discussion > Confirmatory Factor Analysis >
 Raheem Paxton posted on Saturday, September 30, 2006 - 7:33 pm
If the measurement model does not fit the data for a specific group (e.g. females), but fits for males, can you still proceed with constraints to the model.
 Bengt O. Muthen posted on Sunday, October 01, 2006 - 11:24 am
No, you would have to find a well-fitting model for each group.
 Richard E. Zinbarg posted on Tuesday, May 08, 2007 - 8:11 am
Hi Bengt and/or Linda,
I have a couple of questions about invariance that concerns invariance of the variance of a factor. Is my understanding correct that, if one has say two groups, the variance of a factor has to be constrained to some value in one group (either by setting it equal to 1 or by setting one of its loadings equal to 1), and at least one of the loadings on the factor need to be constrained in the second group? Second, is my understanding correct that we still won't know what the variance of the factor equals in the second group in an absolute sense but rather the estimate of the variance in the second group will only be relative the value set by constraint in the first group?
Thanks very much!
Rick Zinbarg
 Linda K. Muthen posted on Tuesday, May 08, 2007 - 8:39 am
The metric of a factor can be set by either fixing a factor loading to one or fixing the factor variance to one. With multiple group analysis, if a factor loading is fixed to one in each group, a factor variance can be estimated for each group. I think this way of setting the metric would be best for multiple group analysis.
 Richard E. Zinbarg posted on Tuesday, May 08, 2007 - 11:20 am
wow - that was fast, thanks Linda! I am a bit confused though as I thought fixing a factor loading to one fixes the factor variance to the variance of that indicator. Thus, I must be misunderstanding something as I don't see how the variance can both be set to equal that of a particular indicator and is estimated. Any help clearing up my confusion would be appreciated.
 Linda K. Muthen posted on Tuesday, May 08, 2007 - 3:29 pm
Fixing a factor indicator to set the metric of the factor uses the scale of the y variable that has the factor loading fixed to one. This implies that as the factor changes one unit, y is expected to change the same amount. This does not mean that the variance of the factor is equal to the variance of y nor that the variance of the factor is fixed at one.
 Richard E. Zinbarg posted on Tuesday, May 08, 2007 - 10:47 pm
that helps, thanks very much! Just to make sure I understand, I am going to try to restate what you said in somewhat different terms.

a standardized loading equals an unstandardized loading times the ratio of the standard deviation of y divided by the standard deviation of the factor.

we can get a standardized loading directly from the item correlations, and the standard deviation of y (or at least our sample estimate of it) is observed. Thus, in the above equation, we are still left with two unknowns and the equation can't be solved in the absence of setting a constraint on one of the two unknowns. We can either constrain the sd of the factor (typically to 1) and estimate the unstandardized loading OR we can constrain the unstandardized loading and estimate the sd of the factor. In the later case (constraining an unstandardized loading), once we constrain one loading, we can now use the estimated sd of the factor to plug into the equation for the unstandardized loadings for all the other items and therefore estimate the remaining unstandardized loadings. Does this sound close to accurate? Thanks again!
 Linda K. Muthen posted on Wednesday, May 09, 2007 - 7:23 am
This is not how I think about it but it may be plausible.
 Marc Mollers posted on Thursday, May 29, 2008 - 11:10 am
Hi Linda and Bengt,
I am a Mplus-beginner and testing for CFA measurement invariance between two groups. Variables are categorical, parameterization is theta. As recommended in the handbook, I compared the configural invariance model with loadings and intercepts free across groups, factor means fixed at 0 for both groups and residual variances of obs. variables fixed at 1 for both groups with the more restrictive default model.
(Part of) Input for configural model:
MODEL: abt by cieqr_2 cieqr_1 cieqr_3 cieqr_7; auff by cieqr_12 cieqr_5 cieqr_10 cieqr_13; empa by cieqr_6 cieqr_9 cieqr_8 cieqr_10 cieqr_14;
schue by cieqr_16 cieqr_6 cieqr_11 cieqr_15 cieqr_17 cieqr_18 cieqr_19 cieqr_20 cieqr_21 cieqr_22;
CIEQR_15 WITH CIEQR_16; cieqr_2 with cieqr_3; [abt@0]; [auff@0]; [empa@0]; [schue@0];
CIEQR_1@1; CIEQR_2@1; and so on... (all residual variances)
MODEL Offline: (one of the two groups)
--> the same model with loadings of reference variables fixed at 1 +
[CIEQR_1$1 CIEQR_1$2 CIEQR_2$1 and so on...(all tresholds)];

Config. model: CFI=.948; RMSEA=.069. Default model: CFI=.951; RMSEA=.066. So the more restrictive model has a better fit? What am I doing wrong? Does this have something to do with the restrictions for factor means and residual variances? How should I test for invariance of loadings and tresholds separately? I am quite desperate.
Thanks a lot in advance and greetings,
 Linda K. Muthen posted on Friday, May 30, 2008 - 7:43 am
Please send all relevant files and your license number to
 John Capman posted on Saturday, January 10, 2009 - 7:48 am
I realize this posting was quite a long time ago, but I was wondering if there may be any feedback as to why the data presented by Marc Moller (above) occurred. i am curious as I am having a similar problem with my data. That is, the more restrictive model is fitting better than the less restrictive. Thanks in advance for your help.

 Linda K. Muthen posted on Saturday, January 10, 2009 - 8:09 am
If you are using the WLSMV estimator, you should be using the DIFFTEST option to test nested models. With WLSMV, only the p-value should be interpreted. The chi-square and degrees of freedom are adjusted to obtain a correct chi-square and should not be interpreted in the regular way. If this is not the case, please send all relevant files and your license number to
 John Capman posted on Saturday, January 10, 2009 - 8:23 am
I did not run the DIFFTEST option. I will try it and let you know. Otherwise, I will send the data with the license #.

Thanks for your prompt reply. Greatly appreciated.
 JPower posted on Sunday, March 08, 2009 - 2:05 pm
I am conducting a CFA with 3 factors and 18 ordinal items at two time points using WLSMV estimation. I am interested in evaluating measurement invariance. I have demonstrated equal form and now want to look at constraining the factor loadings.

1) I constrained all of the factor loadings and conducted a diff test. Chi-square was significant:

Value 31.570
Degrees of Freedom 10**
P-value 0.0005

However, CFI is slightly higher with the loadings constrained and TLI and RMSEA do not change.

Without factor loadings constrained: CFI=0.959, TLI=0.986, RMSEA=0.056, SRMR=0.065

With factor loadings constrained: CFI=0.967, TLI=0.986, RMSEA=0.056, SRMR=0.071

How should I interpret this? My sample size is 686. Is this an example where chi-square may be overly sensitive?

2)To determine the source of the significant chi-square, would an appropriate approach be to consider each factor loading separately and thus conduct 18 diff tests (perhaps adjusted for multiple comparisons?)

2) Should I have constrained the thresholds at the same time as the factor loadings or is this a separate test? It is my understanding from previous posts that it is difficult to disentangle the two. Does this invalidate the meaning of tests looking at factor loadings only when using WLSMV? Is it ever appropriate to consider factor loadings without looking at thresholds?

 Linda K. Muthen posted on Monday, March 09, 2009 - 10:11 am
Chi-square is used for comparing nested models. With WLSMV, you need to use the DIFFTEST option to do this. CFI, TLI, etc. are not.

For categorical outcomes, we recommend looking at the thresholds and factor loadings together. The models we recommend are shown in Chapter 13 for multiple group analysis. The same principles apply across time.
 JPower posted on Monday, March 09, 2009 - 11:28 am
Thanks for your reply. Just a couple of points to clarity -

1) I did use the DIFFTEST option to compare the nested models. Do you think the apparent disagreement between the difftest results and the change in the models CFI,TLI could be due to chi-square being overly sensitive to sample size?

2)To determine the source of the significant DIFFTEST, would an appropriate approach be to consider each factor loading separately and thus conduct 18 diff tests (perhaps adjusted for multiple comparisons?)

Thanks again.
 JPower posted on Monday, March 09, 2009 - 11:40 am
I apologize for the multiple postings - is it still necessary to consider the equivalence of thresholds even if i am not working with MEANSTRUCTURE?
 Linda K. Muthen posted on Monday, March 09, 2009 - 6:55 pm
1. I would not make much of these minor differences in CFI and TLI. They are not useful for comparing models. I would go with the DIFFTEST results. Your sample size is not large for categorical data analysis.

2. You should look at factor loadings and thresholds together. The two of them determine the basic building block of categorical data anaysis, the Item Characteristic Curve.

3. Means and thresholds are the default in Mplus with weighted least squares estimation. They cannot be excluded.
 Erica Valpiani posted on Monday, November 02, 2009 - 2:10 pm
Dear Linda and Bengt,
We are trying to conduct a measurement invariance analysis using the logic of testing increasingly restrictive models (e.g. configural, weak, strong, strict). We have a one factor model with two groups and five categorical indicators. Many of the worked examples (e.g. Gregorich’s worked example start with a model which freely estimates loadings, thresholds and residuals in both groups.

However it appears that, if loadings and thresholds are free, then the residuals need to be set at 1. Whilst our models runs when residuals are set at one, setting these at one also means that we can not follow the logic of testing models by systematically restraining lambas, taus then thetas between the groups as the thetas are already fixed.

Is there any way of creating a baseline model using categorical indicators which can be used to assess the significance of changes in later models?
 Linda K. Muthen posted on Monday, November 02, 2009 - 5:57 pm
The steps to test for measurement invariance with categorical outcomes differs from those for continuous outcomes. The models we suggest are shown on pages 399-400 of the Mplus User's Guide and our Topic 2 course handout.
 Fernando H Andrade posted on Monday, May 03, 2010 - 3:19 pm
Dear Dr. Muthen

I am trying to test for invariance by gender. i keep getting this message




these are the models based on handout topic 2 page 169 for non-invariance model

could you please give me hand? thank you very much


su by y1alc y1cig y1mar;
pa by eng_r hist_r math_r sci_r;
[su-pa@0]; !Setting means to zero
{y1cig@1 y1alc@1 y1mar@1 eng_r@1 math_r@1 hist_r@1 sci_r@1};

model female:
su by y1alc y1cig y1mar;
pa by eng_r hist_r math_r sci_r;
[y1alc$1 y1cig$1 y1mar$1 eng_r$1 math_r$1 hist_r$1 sci_r$1];
 Linda K. Muthen posted on Monday, May 03, 2010 - 5:20 pm
When you mention the first factor indicator in MODEL female, you free it causing the model not to be identified. You should not mention the first factor indicator in group-specific MODEL commands.
 Fernando H Andrade posted on Tuesday, May 04, 2010 - 5:41 am
thank you! works pretty well now.
 nanda mooij posted on Tuesday, June 22, 2010 - 6:30 am
Dear Dr. Muthen. I want to save the tresholds of my model, but Mplus is only giving the factor loadings. Im using WLSM so the tresholds should be the default. When I try the estimator MLR, it does give the tresholds, but I prefer using WLSM, so how can I get the tresholds of my model?
Thanks in regard,
Nanda Mooij
 Linda K. Muthen posted on Tuesday, June 22, 2010 - 6:36 am
If you are using an older version of Mplus, add TYPE=MEANSTRUCTURE to the ANALYSIS command. If not, send the full output and your license number to
 Sabine Spindler posted on Friday, April 08, 2011 - 4:50 am
Dear Dr. Muthen,

I am testing measurement invariance for 1 scale through multiple groups analysis (male, female) in ordered categorical data, using WLSMV and Theta Parameterization.

Baseline Model:
factor loadings estimated
thresholds free
items residual variance fixed @ 1
factor mean @ 0, factor variance @ 1 for identification

Comparison Model:
factor loadings estimated but contrained to be equal across groups
thresholds estimated but constrained to be equal across groups
items residual variance fixed @ 1
factor mean @ 0, factor variance @ 1 for identification

DIFFTEST for Model Comparison is significant, however in the "wrong" direction: The Comparison model has MUCH better fit than the Baseline Model.

How can this be?

Thank you,
 Bengt O. Muthen posted on Friday, April 08, 2011 - 5:20 pm
Your Comparison model is not set up right and will be ill-fitting. This is because you have factor means fixed at zero and factor variances fixed at 1 in all groups (I assume that's what you mean). You should only do it in one (reference) group. Then you'd get the same model as the invariance model we propose.
 Sabine Spindler posted on Saturday, April 09, 2011 - 12:13 am
Thank you very much. I have done that now, and still - the comparison model has an improved fit as compared to the baseline model. Not as much any more as before, but significantly (DIFFTEST). To me, this seems very odd - how can this be?
 Linda K. Muthen posted on Saturday, April 09, 2011 - 4:18 pm
Please send the two outputs and your license number to
 SIMON MOON posted on Friday, July 13, 2012 - 8:44 am
I am conducting a measurement invariance analysis in Mplus. I ran a series of nested models to test measurement invariance with 1 factor (6 continuous items) and 5 groups (the N size is unusually high... >600,000). Because the items were skewed, I used MLM estimator. When I ran scalar invariance model, a strange thing happened: the chi-square is smaller than the configural model's chi-square! I know it is impossible so I tried several things. When I tried the same analyses using ML estimator, the chi-square problem disappeared. I have two questions.
1) Is the smaller chi-square a problem (or a characteristic?) with MLM estimator or did I do something wrong?

!Configural Model (Chi-square = 24776.964, df =45)

!Scalar Model (Chi-square = 17025.705, df=113)

2) Even if the items are skewed, I feel that using ML estimator is OK because I will compare only nested models. Am I on the right track?

Thanks in advance!
 Bengt O. Muthen posted on Friday, July 13, 2012 - 9:52 am
This can answered well only by looking at your two outputs. You can send them to Support.
 Maria posted on Monday, January 21, 2013 - 10:27 am
Dear Bengt,

I am looking at measure invariance on my latent variables across males and females. My items are categorical and I am using the WLSMV estimator.

I run:

1. a model with free loadings and thresholds
2. a model with free thresholds and fixed loadings

Difftest shows me that step 2 is a worse fit and by looking at the modindices I can improve the fit by freeing one loading.

2b. I run a model with free threshold and all fixed loading BUT the one identified by modindices

3. a model with fixed thresholds and loadings( apart from the one identified in step2).

I compare step 3 and 2b and the difftest shows that step 3 offers a worse fit.

3b. Looking at the modindices I free one threshold and re run the analysis.

difftest still shows that the model in 3b is a worse fit than 2b; however, the model fit indices are equivalent...

I went on to free a number of thresholds (based on the modindices) but this did not make my difftest non-significant and did not improve the model fit (CFI/TLI).

Could I argue that as the model fit are good enough by step 3 (same as step 2b), I am confident that it is a good model ?

or is there a better way to identify the thresholds that need exploring other than modindices?

Thanks for your help
 Bengt O. Muthen posted on Monday, January 21, 2013 - 7:48 pm
I assume you are working with ordered polytomous items. If you want to separately test thresholds and loadings with polytomous items you should consult the Millsap-Tien (2004) article. We recommend changing both at the same time for simplicity.
 Maria posted on Tuesday, January 22, 2013 - 12:57 am
Thanks Bengt, I will check out the paper.. my items are binary so I could still apply the polytomous approach?
 Linda K. Muthen posted on Tuesday, January 22, 2013 - 7:02 am
If you are working with binary items, you cannot apply the polytomous approach. For binary items, the models to be compared are shown in the Topic 2 course handout under multiple group analysis and on pages 485-486 of the Version 7 Mplus User's Guide.
 Maria posted on Tuesday, January 22, 2013 - 10:40 am
Thanks Linda,

I have used the example in the manual and I find that the chi-square test is highly significant x2(120) = 570.2, p<.001 while the model fit indices do not worsen compared to the model with all free parameters).

I have compared the loadings for boys and girls when they were allowed to vary (free model) and I identified 3 loadings that may need to be freed - I do this in separate steps - but still the p< .001 and the model fit is unchanged.

I tried freeing thresholds based on the modindices output but this also has very little effect on the chi-squared p value.

I feel this could be due to the large sample size (2,700) as it appears that the chi-square for the difftest is sensitive to large sample size (Chen, 2007)

If this was the case, I'd be inclined to use the model fit indices to evaluate measure invariance - few authors seem to adopt this approach.

I was wondering if you think this is a reasonable approach and hoping you could recommend a paper on this.

Many many thanks
 Maria posted on Tuesday, January 22, 2013 - 10:48 am
I should also probably mention that when I run my main analyses on a fully fixed model vs. the fully free model the results are very similar which is also why I am surprised by the chi-square test results...
 Linda K. Muthen posted on Tuesday, January 22, 2013 - 11:21 am
With binary items, the model with only free factor loadings is not identified. We will have a FAQ with the details about this on the website shortly.
 Maria posted on Tuesday, January 22, 2013 - 1:09 pm
Thanks Linda -

is there anything else I could do in the meantime to check model invariance?

is it sufficient to report that running the model completely separately for male and female (free model) produces equivalent results to using a multigroup approach (fixed model)?

I am writing the results for a paper and I am concerned about reviewers' picking on this.

 Linda K. Muthen posted on Tuesday, January 22, 2013 - 1:59 pm
Please send the outputs that show these equivalent results and your license number to
 Cynthia Yuen posted on Monday, July 14, 2014 - 7:43 pm
Hi Linda and/or Bengt,

I know this is an old thread but I am an Mplus novice with a question about constraining factor means when testing measurement equivalence. I want to test whether or not factor means are equal across cultures. Looking through the thread, it looks like you have to set the mean@0 and variance@1 for a reference group and freely estimate the mean and variance for the second group in order for the model to run, which mine does. But, how do I test whether the means are different across groups? When I constrain them to be equal to each other I get this message:




Is it reasonable to export factor scores for each culture and just run a t-test? Or is there another way to do this within Mplus?

Thank you so much!
 Linda K. Muthen posted on Tuesday, July 15, 2014 - 11:45 am
The way to compare latent variable means is to compares the model where factor means are zero in all groups to the model where factor means are zero in one group and free in the other groups. Note that if you fix the factor variance to one, you should free the first factor loading which is fixed to one by default.
 Leslie Rutkowski posted on Thursday, December 11, 2014 - 12:10 pm
Hi Mplus Team,

With the new convenience feature for measurement invariance, I understand that the mean of the latent variable is fixed to 0 in each group and that no further constraints are imposed on the mean structure. Based on this, the mean structure is just identified as their are as many empirical means in the observed variables as intercepts. Then, we have no DF to test the mean structure.

How can we impose additional constraints (e.g. equality of intercepts) to test the mean structure in one step via the convenience feature?

 Bengt O. Muthen posted on Thursday, December 11, 2014 - 3:18 pm
If you specify

model = configural metric scalar;

you get a summary of chi-square test, including the scalar versus the metric which gives you a test of the mean structure.
 Es Maths posted on Sunday, October 13, 2019 - 6:41 pm

how do we compare a partial scalar model with a configural model (non-nested models)? Is it ok to use DELTA CFI and RMSEA in WLSMV?

 Bengt O. Muthen posted on Monday, October 14, 2019 - 2:10 pm
I don't have much faith in delta CFI and RMSEA. Instead, you can use BIC in ML.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message