Measurement invariance with multigroups PreviousNext
Mplus Discussion > Confirmatory Factor Analysis >
 Anonymous posted on Thursday, March 07, 2013 - 7:47 am
I've been reading your online notes on multiple group analysis with categorical outcomes. After the configural invariance step you suggest going straight to a model where intercepts and slopes etc are constrained across groups and the factor means are freed. This is different from the continuous case where one might examine the factor loadings first and then add in the intercepts in two separate steps, and I was wondering why this would be.
 Bengt O. Muthen posted on Thursday, March 07, 2013 - 9:41 am
This difference between the continuous item case and the categorical items case is due to having less information with categorical items. With binary items there is an identification issue that prevents testing of loading invariance only (metric invariance), at least when allowing group-varying residual variances. So therefore we recommend going straight from configural to scalar invariance. With polytomous items it is possible to identify and analyze the metric model. But even so, it is not as straightforward as with continuous items. Roger Millsap has written on the polytomous case; see for instance his book Statistical Approaches to Measurement Invariance.
 Anonymous posted on Friday, March 08, 2013 - 12:33 am
Thank you, that is extremely helpful. There's a variety of estimators that I can use in Mplus when fitting these models-my outcomes are binary. I recall being told once that ML is associated with the Differential Item Functioning/Item Response Theory approach, and WLSMV is associated with the CFA approach. Is that correct?
 Linda K. Muthen posted on Friday, March 08, 2013 - 6:28 am
No, that is not correct. IRT and CFA with categorical indicators are the same model. You can use either ML or WLSMV as estimators when you have categorical variables and factors. With ML, each factor requires one dimension of integration and each residual correlation also requires one dimension of integration so ML can become computationally heavy with several factors. In this case, WLSMV is preferred. Both ML and WLSMV are good for IRT as is Bayes.
 Anonymous posted on Monday, March 11, 2013 - 9:01 am
I'm trying to use the steps you outline for testing measurement invariance across multiple groups with categorical outcomes, within a MACS framework. Having having obtained my baseline models for each group, I have fit the configural model for each group as eg.

f1 by a* b c d e;
[f1@0]; f1@1;
[a$1 b$1 c$1 d$1 e$1];
{a@1 b@1 c@1 d@1 e@1};

According to your outline, would the next equivalent step be a model in which the loadings and intercepts are constrained the same across groups, with the means still fixed at zero in each group, factor variances at 0 in each group, and scale factors fixed at 1 in the first group and freely estimated in the other groups? I can estimate this model (it isn't a great fit), but end up having to examine MI for factor means and slopes/thresholds simultaneously, something I would prefer not to do. I also wondered about using difftest to compare models in the invariance sequence, as when the scale factors are freely estimated I end up with more parameters in a simpler (ie loadings constrained) model than the more complex model (loadings unconstrained but scale factors fixed)?

Any pointers you can offer will be much appreciated as always!
 Linda K. Muthen posted on Monday, March 11, 2013 - 11:00 am
Please see page 485 of the user's guide and the Topic 2 course handout on the website where the inputs are given under multiple group analysis. Factor variance should not be fixed to zero. If you continue to have problems, send the output and your license number to
 Martijn Hogerbrugge posted on Friday, June 21, 2013 - 8:03 am

I was wondering whether it makes sense to test for residual variance invariance (as can be done in CFAs with continuous observed variables), once scalar invariance has been established in a multigroup CFA with categorical data using the Theta parameterization. Thus, after scalar invariance has been found (following the procedure as depicted on page 486 of the MPlus manual), would it make sense to estimate a third model in which the residual variances are again fixed to be one in all groups, and comparing this model to model #2 described on the top of page 486.

Looking forward to your answer.
 Bengt O. Muthen posted on Friday, June 21, 2013 - 6:31 pm
 Alvin  posted on Thursday, May 08, 2014 - 1:09 am
Hi Dr Muthen, can I just clarify is the configural model one where factor loadings and intercepts are free to vary and no equality constraints are imposed across groups? I realize mplus by default holds factor loadings and intercepts equal across groups to test measurement invariance. Does this mean that in testing invariance of factor loadings alone, one has to then override the default equality constrain of intercept as the first step?
 Bengt O. Muthen posted on Thursday, May 08, 2014 - 6:06 am
Yes, on configural. Note that the current Mplus allows the Analysis options

model = configural metric scalar;

where your Model statememt simply says

f by y1-y10;

and the rest is done automatically.
 Alvin  posted on Thursday, May 08, 2014 - 11:30 pm
Thanks very much Dr Muthen - that's fantastic! I notice you don't get std estimates in the output using this option - is there a way around this? Also, in testing latent mean difference across groups, I constrained the factor mean (to 0) and variance of the reference group to 1 for comparison, while letting the factor mean and variance of the other group be free. This was done with equal intercepts across groups (scalar model) - and the model showed a good fit - does mean that the latent mean structure differs across groups?
 Bengt O. Muthen posted on Friday, May 09, 2014 - 8:02 am
As the output says, you don't get standardized when you ask for several of configural, metric, scalar, but you get it if you do one at a time.

As for your last question, perhaps you are asking if the factor means are different across groups - if so the z value for the factor mean in the second group will tell you.

You should study up on our Topic 1 discussion of invariance issue; video and handout is on our website.
 Alvin  posted on Thursday, May 15, 2014 - 11:49 pm
Thanks very much Dr Muthen -I've read your notes on multigroup CFA. I've tested configural, metric, and scalar invariance, and further, invariance of factor variance and residual variance, on each of the five subscales from a measure I developed. I also looked at latent mean differences across groups. As predicted, two of the subscales tested did not pass the scalar test, that is the intercepts varied across groups. What do you do in this case?
 Linda K. Muthen posted on Friday, May 16, 2014 - 9:42 am
This means you have partial measurement invariance. Please listen to the Topic 1 course handout and video where this is discussed under multiple group analysis.
 Alvin  posted on Tuesday, August 19, 2014 - 11:31 pm
Hi Dr Muthen I was wondering how do you test partial invariance (by freeing parameters) using the latest feature of MI testing in Mplus? Say if you were to release constrains for some of the items - do you use class-specific syntax? Thanks
 Alvin  posted on Tuesday, August 19, 2014 - 11:59 pm
A follow-up question is that in my output, while the chi-square test is significant for each model (1 configural, 2 metric, 3 scalar), LRT tests for nested model comparisons (between 1-3, 2-3, 1-2) are not significant. Does this mean invariance across all levels?
 Linda K. Muthen posted on Wednesday, August 20, 2014 - 4:06 pm
Yes, you can use class-specific syntax.

You should get a well-fitting configural model as a first step. Once you do that you should test it against the other models.
 Xu, Man posted on Friday, December 25, 2015 - 11:53 am
Merry Christmas!

I would like to check something on number of parameters regarding residual variances under THETA and WLSMV setting (in Mplus), as I am a bit confused about the degree of freedom. Take a two-group example of six binary items with one factor. The number of model parameters would be as follows:

Conf inv (24 para): 10 loading, 12 threshold, 0 residual var, 2 factor variance, 0 factor means.
weak inv (19 para): 5 loading, 12 threshold, 0 residual var, 2 factor variance, 0 factor means.
strong inv (26 para): 5 loading, 6 threshold, 12 residual var, 2 factor variance, 1 factor means.

It's been suggested strong invariance model compared against config model directly so it shall be a test with 3 degree of freedoms (26 v.s. 21). Then I am a bit confused because there are actually 10 loadings and 12 threshold involved here but the difference in degree of freedom is only 5. Is this correct or I have missed something?
 Xu, Man posted on Friday, December 25, 2015 - 11:59 am
Sorry, I miss calculated, there should be only 6 residual variances in the strong inv model, but it still leaves 5+6+6+2+1=20 parameters. That is only 4 less than the confi inv model.
 Linda K. Muthen posted on Saturday, December 26, 2015 - 8:54 am
Please send the output and your license number to
 Xu, Man posted on Saturday, December 26, 2015 - 4:49 pm
Thank you. I just got hold of the Millsap & Tein 2004 paper and they seem to say something about binary outcomes being a special case - I will have a look at this first.
 Bengt O. Muthen posted on Sunday, December 27, 2015 - 1:08 pm
4 parameters difference sounds right. The binary case is different in that the metric (weak) invariance model is not identified with 6 residual variances.
 Xu, Man posted on Monday, February 29, 2016 - 8:59 am
Dear Dr. Muthen,

I am carrying on analysis with repeated measures of 6 binary items (two waves). I also wanted to check the mean change of the latent factors. Holding loadings and thresholds equal over time, I noticed that the difference in latent means was dependent on the constraint of the residual variances of the binary indicators. If I specify residual variances equal, then there is a latent mean difference, but this different disappears if I specify the residual variances to be different in the two waves.

I am a bit puzzled mostly because I thought latent means are not supposed to correspond to residual variances - could it that things are different in the case of binary variables?

 Linda K. Muthen posted on Monday, February 29, 2016 - 4:43 pm
Please send the two outputs and your license number to
 Dennis Li posted on Saturday, July 30, 2016 - 4:32 am
I am trying to run invariance testing on a 6-class LPA with a known class, but I am having trouble with the syntax. I get the error message "Measurement invariance only available for TYPE=MIXTURE with one categorical latent variable and the KNOWNCLASS option." Is it possible to test my LPA against a known class using this method? My (abridged) syntax is:

Classes = c(6) g(2); ! Here's the source of the error
Knownclass = g (g=1 g=2);
Type = mixture;
Model= configural metric scalar;
 Bengt O. Muthen posted on Saturday, July 30, 2016 - 4:16 pm
I don't think so. You would have to set up the invariance testing restrictions yourself.
 Maren Schulze posted on Monday, August 01, 2016 - 8:55 am
we have run a two-correlated factor model for two groups. For scalar invariance, when factor means are fixed at 0 in group 1 and free in group 2, the values are .12 and -.14 for group 2, both with p-values < .01. This implies that both groups differ in their means - right? Fit of the model with fixed loadings and intercepts/thresholds is good though.

What puzzles us is that when comparing factor scores - which we had estimated in a previous step (for both groups combined) -, the mean factor scores for group 1 and group 2 are more or less the same.

Have we misspecified the multi-group model in any way?

Could it be due to the fact that our categorical indicators have varying numbers of categories and differing values?

Thanks for your help!
 Bengt O. Muthen posted on Monday, August 01, 2016 - 4:20 pm
Q1. Right.

Means and variances of estimated factor scores don't behave like those of true factors.

Q2. No necessarily.

Q3. I don't think so. Although if the number of categories vary across the groups you should use the * approach discussed in the UG.
 Maren Schulze posted on Tuesday, August 02, 2016 - 6:52 am
Dear Bengt,

thank you very much for your reply.

I have checked the output file again and have come across the following output: Under "UNIVARIATE SAMPLE STATISTICS", we receive "DESCRIPTIVE STATISTICS" for both factors for both groups:

Mean (F1, Group 1) : 7.989
Mean (F1, Group 2) : 8.041
Mean (F2, Group 1) : 8.388
Mean (F2, Group 2) : 8.399

How does Mplus arrive at these values? The mean scores for both factors F1 and F2 vary between -.052 and -.005 in both groups.

Could this explain the difference?

We have the same number of categories in both groups, the number of categories just differs between items.
 Bengt O. Muthen posted on Tuesday, August 02, 2016 - 6:01 pm
We need to see your full output to say. Please send to Support along with your license number.
 Maren Schulze posted on Wednesday, August 03, 2016 - 7:22 am
Dear Bengt,

thanks for your reply, we have solved the above question.

We have an additional question:

When we ask Mplus to save factorscores in the model in which we test scalar invariance (fixing loadings, thresholds, intercepts), the factor scores we receive for both groups G1 and G2 differ (this would be in line with the means of both factors F1 and F2 being different between both groups).

When simply specifying a common model (no specification for groups) and do not fix any parameters (apart from factor variance @1 in order to identify the model) and request factorscores, the two groups don't differ.

Is the difference due to fixing the parameters in the scalar model?

What does that imply?
 Bengt O. Muthen posted on Wednesday, August 03, 2016 - 10:40 am
The combined group model is essentially wrong for each of the groups, especially the factor covariance matrix. You should ignore that analysis. See also Section 3 of the paper on our website:
 M.F. posted on Thursday, August 04, 2016 - 2:32 am
Dear Mr. Muthen,

thanks for your reply.

So you would say that we have to use the factor scores saved in the model for scalar invariance for further analysis?

It is interesting that the latent means in the one model differ between the two groups and in the other not although we actually have scalar measurement invariance for these two groups.
 Bengt O. Muthen posted on Thursday, August 04, 2016 - 6:43 pm
Q1. Yes.
 Kathy Xiao posted on Monday, January 23, 2017 - 5:37 pm
Dear Dr. Muthen,
I am doing a measurement invariance testing with 3 racial groups (black, latino, white) on a 13-item construct.

F by F1-F13

After I do the baseline model for each racial group separately (USEOBSERVATION to select group), I got the model fit as:
blacK: 065 (.690, .782)
latino: .083 (.710, .778)
white: .071 (.647, .689)
and the CFI/TLI are all above 0.95.

Given the relatively bad fit of RMSEA, does that mean I cannot continue the invariance testing? Or is there modification of the model I can do?

Many thanks!
 Bengt O. Muthen posted on Tuesday, January 24, 2017 - 2:43 pm
Try using Modindices.
 Kathy Xiao posted on Tuesday, January 24, 2017 - 3:17 pm
I used MODINDICES(3.84) but it did not show any error correlation. Or shall I use the MODINDICES(all) instead?
 Bengt O. Muthen posted on Tuesday, January 24, 2017 - 5:05 pm
Yes, use All.
 Kathy Xiao posted on Tuesday, January 24, 2017 - 5:14 pm
I used MODINDICES(all) and found error correlations between S1 and S2 for Latino and White groups, then I added them in the MODEL Latino and MODEL White,the RMSEA are now below .06.

Does that mean I need to keep the "S1 WITH S2" in the later measurement invariance through configural model to strict model?
 Bengt O. Muthen posted on Tuesday, January 24, 2017 - 5:46 pm
That's reasonable. But you may want to discuss these general analysis strategies on SEMNET.
 Kathy Xiao posted on Tuesday, January 24, 2017 - 6:20 pm
Thanks for your reply. I will move the discussion to SEMNET later. One question on MI of multiple group:

I further tested the measurement invariance and found non-invariance. So I proceeded to test the invariance of loadings by comparing the metric model to a model with one loading freed at a time. However, I found that all of the loadings freed lead to significant worse of the model (p<0.05), and this is the case for all the 2-2 group comparisons.

Would you think this suggest non-invariance of this construct across racial groups? Is there anything else that need to be considered in the analysis?
 Kathy Xiao posted on Tuesday, January 31, 2017 - 11:26 pm
Dear Dr. Muthen,

I am doing measurement invariance with 3 groups, the outcome is a 12-item categorical variable with 4 options in responses. I found metric invariance but non-invariance for scalar model.

I want to proceed to free the thresholds to find the source of non-invariance. But there are 3*12 thresholds, is there any systematic and recommended strategy on which threshold I shall start with? Shall I free them one by one? Or shall I free them two by two? Or else?

 Bengt O. Muthen posted on Wednesday, February 01, 2017 - 4:16 pm
There is no agreed on strategy I think. I would go variable by variable (so all thresholds for a variable).
 Kathy Xiao posted on Wednesday, February 01, 2017 - 6:13 pm
Why it is all thresholds for a variable?

Can I do one threshold at one time? In this case I know exactly how many thresholds are allowed to be free?
 Bengt O. Muthen posted on Wednesday, February 01, 2017 - 6:24 pm
You can do one threshold at a time but that would be very cumbersome. I would think that it is the variable itself that causes non-invariance, not necessarily specific thresholds.
 Kathy Xiao posted on Sunday, February 05, 2017 - 6:43 am
Thanks for your reply!

Follow up the previous questions, I tested one threshold at a time, and I found 6 out of 33 thresholds led to significant worse fit. I thought freeing these 6 thresholds would make the comparison non-significant.

However, when I freed them together, the Chi-square comparison was significant.

I also tried to free a combination of thresholds that cause the biggest Chi-square change, the result was also significant.

Do you think is there is anything wrong with this strategy?
 Linda K. Muthen posted on Sunday, February 05, 2017 - 6:53 am
You may want to ask this question on a general discussion forum like SEMNET. As stated earlier, there is no agreed on strategy.
 Kathy Xiao posted on Sunday, February 05, 2017 - 6:56 am
Thank you!
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message