Measurement Invariance PreviousNext
Mplus Discussion > Categorical Data Modeling >
 Anonymous posted on Thursday, February 03, 2000 - 11:58 am
Some people have suggested that latent growth modeling and multigroup comparison of latent means should be based on completely invariant measures of the latent constructs. In others words, factor loadings should be equivalent across time and groups. Variant factor loadings just reflect changes and differences. In situations where the focus of research is to find out relationships of the latent variables but make up unbiased tests, should we bother to seek invariant measures? Dr. Muthen's vocie has not been heard since his earlier publication regarding a similar issue!
 Leigh Roeger posted on Wednesday, February 09, 2000 - 4:53 pm
Probably up untill a couple of days ago I would have also thought that measurement invariance was more a concern for comparing means than for correlational type research. I have just read an excellant paper by Horn and McAardle (1992) Experimental Aging Research vol 18 n3 p117-144 which among other things raises this issue They provide a very clear explanation (p118) why this isnt the case. They demonstate quite convincing that 'for interetations of correlations the same attribute must be measured in each group'.

It really is a good paper for better understanding the importance of measurement invariance.
 Bengt O. Muthen posted on Thursday, February 10, 2000 - 8:59 am
Just so I know I am answering the right thing, can Anonymous please clarify the statement:

"In situations where the focus of research is to find out relationships of the latent variables but make up unbiased tests, should we bother to seek invariant measures?"

Is there a "not" missing in this sentence?
 Anonymous posted on Wednesday, February 23, 2000 - 10:45 am
Anyway, I would like to know situations where partially invariant factor loadings, in the sense of week measurement invariance, suffice to yield reliable parameters for comparisons, if possible.
Should I do a study without the need to refer to previous studies, I would seek invariant measures. When there are a bunch of studies to "discuss", the more reliable results but inconsistent with previous ones might be hard to be accepted by peers. It may be interesting to start a "pen fight".

I am also having trouble checking the invariance of factor loadings across groups, with all the indicators being categorical. Some variables do not have similar categories across groups. Collapsing some categories to make them consistent across groups seem to alther the expected results that can be obtained without collapsing. Should I do a multiple single group analyses and bypass the issue of invariance?

In addition, I would like to have some guidelines to check invariance of factor loadings with categorical indicators.
 Linda K. Muthen posted on Thursday, February 24, 2000 - 5:49 pm
I think once you can't claim total measurement invariance of thresholds and factor loadings, it depends on whether the invariance has a substantive reason. Of course, this is always open to discussion. I would hope some more substantive readers of Mplus discussion would comment. For example, it would be reasonable to assume that you measure the same construct if you have ten items of which 8 are invariant and the 2 that are not invariant have a good substantive interpretation for their noninvariance. But it would be questionable to assume that you measure the same construct if 2 items were invariant and the remaining 8 not.

When there are a different number of categories for the same indicator in different groups, it makes it impossible to hold the thresholds equal because there are not the same number of thresholds. This cannot be done in Mplus. Mplus requires the same number of categories in each group. Collapsing categories should not make a substantial difference in the model unless the model was ill-fitting to begin with. I believe you should do multiple single group analyses without collapsing categories, then with collapsing categories, then a multiple group with collapsed categories.

The guidelines would be the same as for continuous indicators. First run the model without invariance, then with factor loading invariance, then with both factor loading and threshold invariance. In all cases, use WLS to get chi-square difference tests. The invariance should not significantly worsen the fit. If it does at any one step, look at the derivatives to see where the misfit might be and modify the model.
 Leigh Roeger posted on Wednesday, March 01, 2000 - 5:19 pm
Having thought about the issue of measurement invariance from perhaps a more substantive oreintation than a statistical one can I take up Linda's request and make a couple of observations.

The first is that the actual importance of measurement invariance (or lack of) will quite likely vary depending on how large the initial difference between the groups (say boys and girls) was to begin with. So for example in my MIMIC girls rated their mums as more caring than boys. There are several items showing DIF but nothing you did altered this basic very strong finding.

On the other hand a factor relating to behavioural freedon the latent mean difference between boys and girls is small and you can make this difference significant or not by allowing some items to vary between boys and girls. I chose a sig level of 0.05 to decide whether to let factor loadings and thresholds vary but I wonder whether whether a sig test is really right for this decision. I ran 100 such tests - and am I really saying 0.051 is not biased (because its not sig) - is this test independent of sample size - and how robust is it all when the model fit is by no means good. I think some kind of judgement needs to enter this but you would have a hard time convicing anyone that you didnt just play until you got the answer that you wanted!

The second point is that its possible in any scale the DIF might function both ways and in effect cancel the bias out. So in Linda's example if the two out of ten items are going in opposite directions (one favouring one group and the other the other group) and are of the same magnitude by letting these vary the latent mean shouldnt change at all. In fact this is what I found - a lot of items showing DIF but it didnt really cause much bias in the total score because they were pulling in opposite directions willy nilly without any clear pattern. So a scale full of biased items doesnt necessarily mean the total test score is biased - not perfect thats for sure but maybe not that bad either for what your trying to do? This is in no way to argue that one shouldnt worry about DIF - because you won't know whats going on until you run the analysis.

I look forward to the other views of researchers who are out there grappling with these issues.
 Bengt O. Muthen posted on Thursday, March 02, 2000 - 10:14 am
Good points. For an example where a single item's DIF made a big difference, see

Gallo, J.J., Anthony, J. & Muthen, B. (1994). Age differences in the symptoms of depression: a latent trait analysis. Journals of Gerontology: Psychological Sciences, 49, 251-264. (#52).

Also, items with DIF don't have to be thrown out if the model itself includes parameters that allow for the DIF (direct effects in MIMIC, non-invariance measurement parameters in multiple-group analysis).
 Chongming Yang posted on Wednesday, March 08, 2000 - 12:00 pm
Two other articles also provide some hints on how to deal with the problems of the first discussant.

Muthen, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, Vol. 54, No. 4, 557-585.

Pentz, M. A., & Chou, C.P. (1994). Measurement invariance in longitudinal clinical research assuming change from development and intervention. Journal of Consulting and Clinical Psychology. Vol. 62, No. 3, 450-462
 Leigh.Roeger posted on Tuesday, October 10, 2000 - 11:21 pm
Can I ask for an opinion about the issue of scalar invariance.

In the literature many researchers seem to assume that if a scale shows configural (same pattern of zero and nonzero factor loadings across groups) invariance and also metric invariance (same factor loadings) then this shows that observed mean differences can be meaningfully interpreted.

Others come along and say that when latent means are being compared scalar invariance (equal intercepts) is also required. This makes sense but my question is whether it is necessary to achieve scalar invariance for the comparison not of latent means but observed means. If it is (as I think it is) then there is a lot of misunderstanding out there. Any comments from anyone?
 Randy MacIntosh posted on Wednesday, October 11, 2000 - 12:15 am
I was wondering if it would be possible to post some clarification on the last comment in Bengt's 3/2 posting?

Wouldn't non-invariant measurement parameters run counter to Meredith's 'strong factorial invariance' that is necessary to make valid comparisons across groups on latent variable means?

I think this point is also relevant to Leigh's recent post.
 bmuthen posted on Wednesday, October 11, 2000 - 2:28 pm
To answer Leigh and Randy, I think one needs invariant measurement intercepts in addition to invariant slopes to be able to compare observed means - if this invariance is not present, two people with the same observed value have different factor values (and are therefore different). As for latent means, my thinking is that one can compare them also under only partial measurement invariance as long as one allows for meaningful parameters that pick up the noninvariance.
 Markku Niemivirta posted on Tuesday, December 17, 2002 - 5:01 am
Background: I'm supposed to examine the influence of cultural background (two nationalities), gender, age (a continuous variable), and their two-way interactions on children's number sense. Number sense is assessed with a test comprising of 40 binary variables. Due to the design and limited sample size (330 participants altogether) I intended to use a MIMIC-model, where the two categorical variables, one continuous variable, and their interactions predict the latent structure of children's number sense. Since I'm also supposed to compare three alternative latent structures (1-factor, 2-factor, and 7-factor solutions), I have a situation where I first need to test the competing models and their invariance across the background variables.

To do this, I ran a set of MIMIC-models with alternative factor solutions for each background variable at a time. My assumption was that if the measurements in relation to different background variables are invariant, the model fit should be good, given that the factor solution is appropriate. The presence of possible DIF should be indicated by high derivative values.

The fit stats for these sets of models were as follows (WLSMV was used):

Nationality as the covariate: (1 factor model): p=.000, CFI=.952, TLI=.976, RMSEA=.047; (2-factor model); p=.000, CFI=.965, TLI=.983, RMSEA=.040; (7-factor model): p=.0004, CFI=.971, .985, .037. (Two items seemed to flag for DIF).

Gender as the covariate: (1f): p=.000, CFI=.968, TLI=.985, RMSEA=.040; (2fs): p=.0012, CFI=.976, TLI=.989, RMSEA=.035; (7fs): p=.0080, CFI=.982, TLI=.991, RMSEA=.031. (No DIF).

Age as a covariate: (1f): p=.000, CFI=.932, TLI=.953, RMSEA=.045; (2fs): p=.0001, CFI=.949, TLI=.964, RMSEA=.040; (7fs): p=.0010, CFI=.959, TLI=.971, RMSEA=.036. (One item seemed to flag for DIF).

A main effect model with all predictors: (1f): p=.000, CFI=.933, TLI=.948, RMSEA=.040; (2fs): p=.0005, CFI=.947, TLI=.959, RMSEA=.036; (7fs): p=.0035, CFI=.958, TLI=.967, RMSEA=.032.

A model with all two-way interactions included: (1f): p=.000, CFI=.920, TLI=.935, RMSEA=.039; (2fs): p=.0008, CFI=.938, TLI=.950, RMSEA=.035; (7fs): p=.0047, CFI=.951, TLI=.959, RMSEA=.031.

My questions:

(1) Is this procedure valid for (a) testing for invariance across the different groups, and (b) for simultaneously comparing the alternative factor solutions?

(2) Given the fit stats, what should be concluded about the alternative models? I cannot use the likelihood ratio test with WLSMV, so I should compare the change in fit indices. However, I only know of one simulation study (Cheung & Rensvold, 2002) that provides some guidelines for this, but that study used ML-estimation, so it isn't directly applicable. I would definitely say, that sufficient measurement invariance exists, but given that even the 1-factor solution show moderate fit, do I have grounds to argue for the 7-factor solution? Then again, is the 7-factor solution really any better than the 2-factor solution (especially considering the complexity it adds to the results)?

(3) Am I missing something relevant?

Sorry for the lengthy message, but I thought it would be better to provide all the necessary information.

 Linda K. Muthen posted on Wednesday, December 18, 2002 - 9:24 am
I would not approach the problem as you have. I would first decide on the best factor model without including covariates. I would then move to a MIMIC model to see which covariates might be related to measruement non-invariance. I would then use these covariates in a multiple group analysis. MIMIC can only see invariance in the intercepts/threhsolds. It cannot see invariance in the factor loadings.
 Markku Niemivirta posted on Wednesday, December 18, 2002 - 11:47 pm
Thanks! Few additional questions:

Since the Ns in my subsamples are rather low, I was hoping to avoid multigroup comparison altogether. Now, if I followed the procedure you suggested, would you say that, given the level of fit obtained with MIMIC models (cf. my previous message), there is any need to proceed to multigroup comparisons in the first place? Also, can I use derivative values with continuous covariates just as with categorical covariates?
 Linda K. Muthen posted on Thursday, December 19, 2002 - 5:51 am
When you say "flagged for dif", what do you mean?

You can use derivatives for continuous or categorical covariates. It is the scale of your outcome variable that decides whether modification indices are available or you need to use derivatives, that is, unscaled modification indices.
 Markku Niemivirta posted on Thursday, December 19, 2002 - 12:29 pm
By "flagged for dif" I mean that certain derivatives were high (btw, how can one decide what values are 'high' except for testing the influence of adding the direct effect from a covariate to the observed variables?)

But with WLSMV there is no MIs available other than derivatives, right?
 Linda K. Muthen posted on Thursday, December 19, 2002 - 6:28 pm
You can't decide which are high without doing a run. There are no derivatives for WLS, WLSM, or WLSMV.
 Markku Niemivirta posted on Thursday, December 19, 2002 - 11:46 pm
Sorry, but what do you mean by "there are no derivatives for WLS, WLSM, or WLSMV"?
 Linda K. Muthen posted on Friday, December 20, 2002 - 6:11 am
I mean no modification indices. Sorry I had a very long day yesterday.
 Michael Conley posted on Thursday, May 29, 2003 - 11:08 am
I bring this question up here because it is motivated by my recent reading of Prof Muthen's Webnote 4 discussion of the Latent Response Variable Model. That note states that the LRV model presumes a causal relationship not just conditional probability. This directional causal aspect of the model is my understanding of FA in general.

Two papers on measurement equivalence (Raju, Laffitte, and Byrne 2002, Applied Psychology; and Flowers, Raju and Oshima, Paper at 2002 NCME) question whether tau differences on an item between two groups resulting from a CFA analysis is necessarily DIF. They point out that the difference could be due to impact, differences in ability. The basis for this suggestion appears to be based on the standard model for a person's observed score for an item i (subscripts omitted)

x = tau +(lambda)(ksi) + delta

and thus the expected value of x, MUx is

MUx = tau + (lambda)(MUksi)

It is stated that algebraically

tau = MUksi - (lambda)(MUx)

It is observed that "when the lambdas are equal, the difference in taus (intercepts) will simply reflect the difference in the means of x and ksi across populations." (Flowers et al). Raju et al state that (the tau values in the tau equations for each group) “depend on item and factor means."

I can understand that if certain values in the algebraic formula are known then a remaining value is implied by the formula. However, given a causal view of the CFA model I have trouble seeing how tau is caused by or dependent on ksi. If the formula above is taken as describing the causes of tau, then MUx is a cause of tau. But that seems to turn the causal model idea on its head. Speaking of heads, this is probably way over my head. But that is why I am asking. Using this logic couldn't I also solve for lambda and then say that lambda can be caused, in part, by differences in ksi across groups and thus lambdas are also ambiguous?

Also, in IRT, DIF is usually viewed as any difference in the ICC's for two groups for an item. Threshold differences don't seem, I think, to be viewed as ambiguous. In fact, I thought an advantage of IRT was that equatable (if there is no DIF) threshold estimates did not depend on the ability of the group. In the CFA approach both groups are on the same scale so the equating shouldn’t be needed? I’m probably a little confused.
 bmuthen posted on Thursday, May 29, 2003 - 6:04 pm
I haven't read those papers, but I assume that they might be talking about a situation where if you ignore the group differences in the ability means, you can mistakenly get the impression that the item has different thresholds for 2 groups (bias or DIF). If, however, you allow for ability differences across the groups in your analysis, no DIF will be discovered. When we do DIF analysis using Mplus and the "MIMIC approach", that is having the grouping variable as a covariate, you are allowing for group difference in ability. Same for the multiple-group approach.
 Michael Conley posted on Friday, May 30, 2003 - 1:44 pm
I can see that if you constrain ability factor means across groups to be the same, then things can get confounded. But it doesn’t seem to me that this is what these articles are doing.

The Flowers et al paper entitled “ A comparison of measurement equivalence methods based on confirmatory factor analysis and item response theory” uses simulation of two group linear CFA for its analysis. They simulate population groups both with and without ability factor differences. They don’t appear to constrain the factor mean estimates to be the same across groups.

The Raju et al article entitled “Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory” also doesn’t appear to constrain factor means.

I don’t know, but if a tau difference between groups (given free factor means) isn’t an indication of DIF, then that seems to me to be a significant point. But I haven’t seen that point made anywhere else. I will keep pondering. It just seems to me that if in the population (forget estimation) students, male and female, have the same ability level but have different expected item performance (due to different tau’s for the groups) then this should be called DIF.

I know this isn’t strictly an Mplus issue, but Mplus has a unique intersection with IRT. It is interesting to note that the Raju et al article highlights as a major IRT/CFA difference the nonlinear vs linear models (with no reference to Mplus as an option for CFA).
 bmuthen posted on Saturday, May 31, 2003 - 2:29 pm
Could you send me those 2 papers so I can take a look at this?
 Michael Conley posted on Saturday, May 31, 2003 - 4:22 pm
I just sent you an email with the two PDF files of the articles attached. Thank you for your interest and time. I may just be misreading someting.
 bmuthen posted on Saturday, June 14, 2003 - 10:39 am
I have now read the Raju et al article that you sent me. In my view, the article's discussion of this issue on page 520 is confused. I am surprised that this mistaken description got past the reviewers.

First, equation 18 is formulated as if the intercept is a function of the item mean and the factor mean. This is looking at the situation backwards. Equation 16 is the correct "causal" view: the factor mean and the intercept produce the item mean. The factor mean describes a property of the individuals and the intercept a property of the item. Together they produce the item mean.

Second, the Dorans and Holland quote is misunderstood. Say that z is the group dummy variable. In my view, the quote implies that "impact" is the effect of z on x mediated by the factor, while DIF is the direct effect on the item x. I.e. DIF represents a difference on x between the groups when they are considered at the same level of the factor - so a conditional item difference given the factor. In contrast, the article's paragraph below the quote gives a confused view, e.g. talking about a "current uncertainty" about intercepts and therefore focusing on loadings. There is no uncertainty in this area. See e.g IRT-related references to Muthen on the Mplus web site. In conclusion, this aspect of the article should be ignored in my view. I am sorry that it was also confusing the other researchers that you mention.
 Michael Conley posted on Saturday, June 14, 2003 - 12:08 pm
Thank you. Your help is greatly appreciated.
 Anonymous posted on Friday, November 21, 2003 - 11:11 am
I would like you to clarify me the several types of factorial invariance: configuration, metric and strong.

 bmuthen posted on Monday, November 24, 2003 - 11:02 am
I am not familiar with the terms, but they sound like the correspond to

- having the fixed zero loadings in the same places

- having measurement intercepts and slopes invariant

- having measurement intercepts, slopes, and residual variances invariant

More later on this.
 Anonymous posted on Thursday, November 27, 2003 - 9:07 am
Thank you very much.
I will be waiting for more.
 Anonymous posted on Tuesday, December 02, 2003 - 4:38 am
I saw this last message about the several types of factorial invariance. Tell me... slopes and intercepts aren't "features" that only make part of the latent mean structures?
 Linda K. Muthen posted on Tuesday, December 02, 2003 - 8:02 am
I'm afraid we don't understand your question. Can you please clarify?
 Anonymous posted on Monday, December 15, 2003 - 10:27 am
I would like that you clarify what are slopes and intercepts, and if these "concepts" make part of all strutural models or just in latent mean structures models.
 Linda K. Muthen posted on Monday, December 15, 2003 - 10:53 am
In the regression of y on x,

y = a + bx + e

where a is an intercept, b is a slope, and e is an error term.

In SEM, if means are not included in the analysis, no intercepts are estimated. Only a slope is estimated in this situation.

I hope this answers your question. If not, please clarify.
 Leigh Roeger posted on Tuesday, January 06, 2004 - 9:26 pm
I have a data set of 6028 (boys: 2912 Girls: 3116) who completed a 20 item scale (4 categories).
The mean score for boys is 10.91 (SD 6.98)
The mean score for girls is 13.30 (SD 8.53)

Using WLS I estimate a multiple group model assuming full metric and scalar invariance (equal factor loadings and equal threaholds).
Boys are group one and girls are group two.
The variance of the boy latent is fixed at one
(to put things in z score metric).
As expected the boy latent mean is zero and the girl latent mean is higher 0.303.

After a series of tests I have concluded that several factor loadings and thresholds are not invariant across gender. I produce a final model
allowing several factor loadings and thresholds to vary across gender. I reestimate the latent mean and the find the girl latent mean
has dropped to 0.279.

This is as expected because the direction of the bias is to increase scores for girls. My question is is it appropriate or possible to
translate this change back to raw scores. I would like to say that the approximate bias impact on raw scores in the present data
is around 10% ((100/279)*303). The raw difference in total scores is
2.50 and so a reduction of 10% to this equates to 0.250 or a quater of a point.

How does this logic sound - is there a neater way.

Thanks for any advice!
 Linda K. Muthen posted on Wednesday, January 07, 2004 - 9:52 am
What you suggest makes sense, but it would be a rough approximation given that the sum of the factor indicators may not be a good representation of the factor. Also, I'm a little confused about how you will get the sum on the same metric as the factor.
 Felix Hansen posted on Wednesday, May 12, 2004 - 3:46 am
I have a MIMIC model in which varibles form indeces for several categories that have an effect on two latent constructs. Now I want to compare groups with respect to the paths leading from these categories to two latent variables. First checking for configural invariance (i.e. the sae variables form the same category across groups) does not pose a prolem (with AMOS). But how to check for metric invariance? Most articles I read dealt with that problem for models where variables were reflective indicators. Does someone know an article that deals with my problem? Thanks in advance!
 Linda K. Muthen posted on Wednesday, May 12, 2004 - 9:51 am
I assume that you are looking for an article that discusses measruement invariance for formative indicators. I am not aware of any article that deals with this topic. Perhaps this is a question for the broader SEMNET group.
 Anonymous posted on Saturday, January 08, 2005 - 11:41 am
Does anyone know of good references describing the nuts and bolts of conducting and intrepreting the output of a CFA with categorical indicators? Thanks
 Scott Weaver posted on Saturday, January 08, 2005 - 2:39 pm
Try this article
Millsap, R. E., & Yun-Tein, J. (2004). Assessing Factorial Invariance in Ordered-Categorical Measures. Multivariate Behavioral Research, 39(3), 479-515.
 EOJ posted on Thursday, December 15, 2005 - 6:54 am
There hasn't been much action on this thread but it seems like the right subject for my question.

I am trying to deal with the question of measurement invariance across multiple groups. I first approached this in a traditional way. I ran multigroup models with difftest comparisons and found metric variance when using the Mplus default of fixing the thresholds for the categorical indicators. Once I allowed the thresholds to be freely estimated, fixed the scales to 1 and the means to be equal across groups there was no longer a significant chi square difference between groups.

Being uncertain whether this meant that the groups differed in the level of the underlying construct (ie mean differences), which would give rise to differences in response patterns across items, or if there was indeed item bias, I turned to the use of the MIMIC model in Galleo etal. 1994 and Chen & Anthony 03 papers. I estimated the MIMIC model with paths from group indicators to the latent variable to estimate mean differences in the construct (RMSEA = 0.08). Then I added direct paths from the group variables to each factor indicator, one at a time. Having done this for each group variable – factor indicator pair, I retained all the significant paths from group variable to the particular factor indicators in a final model. In this final model the mean differences between groups disappeared (RMSEA = 0.07).

If this sequence of analyses is correct I am left with trying to interpret whether this means that there are mean differences between these groups in the latent construct, but the measure is essential invariant across these groups, or that the groups are the same on the latent construct but there is bias in certain indicators. Does this sequence analysis make sense? If so, is there a reasonable way to determine between these very different conclusions about the use of this measure in these groups, and indeed the nature of the construct for these groups? Thanks for any help you can provide.
 Linda K. Muthen posted on Thursday, December 15, 2005 - 9:51 am
The models we recommend for testing measurment invariance of categorical outcomes for the default Delta parameterization are:

1. The Mplus default for multiple groups where thresholds and factor loadings are constrained to be equal across groups, factor means are zero in the first group and free in the other and scale factors are fixed to one in the first group and free in the others.

2. A model in which thresholds and factor loadings are free across groups, factor means are zero in all groups and scale factors are fixed to one in all groups. With categorical outcomes, thresholds and factor loadings need to be freed and constrained in tandem.

3. See Example 5.16 for an example of partial measurement invariance.
 EOJ posted on Thursday, December 15, 2005 - 11:23 am
Thanks for the response. I think that I did the analyses you suggested. This resulted in significant difference between groups (ie. the constraining loadings and thresholds to be equal across groups was sig. worse than freeing the loadings). However, when run in separate analyses the two groups provide very very similiar loadings. I looked at the thresholds for the two groups and saw that they were different on some items. When I freed the thresholds for the second group and fixed the scale factors to one for the second group and added an equality constraint for the means between groups the groups were no longer signficantly different. What I was trying to get at my moving to the MIMIC model was to be able to look at both mean differences in the latent variable by group and see if items on the scale or indicators were biased across groups. When I took the Gallo et al. 1994 approach the mean differences between groups disappeared when the sig. direct paths were estimated between the group variables and some of the indicators. This suggested to me there may be 'true' group differences in the underlying contruct that are manafest in the threshold differences represented by the signficant paths in the MIMIC model and the threshold differences in the standard measurement invariance approach you cited. This is a different interpretation than that the measure works differently in the two groups. Does this make sense or should I try a different tact in explaining what I'm doing? Thanks for your patience.
 bmuthen posted on Friday, December 16, 2005 - 8:11 am
Linda's step 1 works with thresholds and loadings in tandem, which is different than what you state in your second sentence above.

More importantly, let me comment on what you say in relation to Gallo et al (1994) and group differences in construct means disappearing when direct effects are included. Perhaps I misunderstand you, but true group differences don't manifest themselves in threshold differences in a MIMIC model with significant direct effects. Also, a multiple-group analysis that finds significant threshold (and loading) non-invariance does not point to true group differences in construct means either. The construct means can be different or not. Measurement invariance is a different issue than construct mean differences. Measurement invariance has to do with a conditional statement: given the same construct value, do different groups have different item probability. You may want to look at some of the writing I have done on the MIMIC topic as posted in the reference section of our web site.
 Antonio posted on Tuesday, December 27, 2005 - 2:21 am
I was always thinking of invariance in the context of Rasch measurement.
The rasch ability scores and item difficulty scores in logits should remain equal (within the expected range of error)regardless of which items the examinees take and regardless of the sample which is used to estimate item difficulties, respectively.
Can any one elaborate on this?
Why is it important to demonstrate this?
What does it prove?
And introduce some literature on this.
 bmuthen posted on Wednesday, December 28, 2005 - 5:03 pm
Yes, this is a central topic in the Rasch literature. I am not a Rasch expert myself, but any Rasch-related book should discuss this. For example, Erling B Andersen has a book which covers Rasch modeling (I don't remember the book title). Other Mplus Discussion readers might want to jump in here.
 Roger Millsap posted on Sunday, January 01, 2006 - 11:04 pm
Antonio raises a point that confuses many people. The theoretical "invariance" produced by the Rasch model is based on
the fit of the model, and it presumes such
fit across the groups being compared. But
by most conceptions, violations of invariance are violations of the Rasch
assumptions (e.g., unidimensionality), and
so the supposed "invariance" given by the
Rasch model evaporates. There was some literature on this about 30 years ago, with an exchange between Susan Whitely (now Embretson) and Ben Wright, but I can't recall the journal. Also, Erling
Anderson's book was "Discrete statistical models with social science applications."
Quite a nice book, but not easy.
 bmuthen posted on Monday, January 02, 2006 - 2:34 pm
Thanks, Roger.
 EOJ posted on Friday, January 20, 2006 - 9:01 am
Thanks for your responses on 12/15 and 12/16.

I have looked at the example that you mention (5.16) and I see that you keep the loadings and thresholds both freed or both fixed across groups in parallel. I apologize if this is a foolish question, but I wonder why? (This is particulary puzzling since the treatment (i.e., fix/free) of intercepts and loadings can vary in MG models for continuous indicators).

More generally, if groups differ on the degree of an underlying construct (means) those differences will be reflected in the thresholds for individual items (i.e. the proportion of the groups at different response levels of particular items – one group higher than the other) yet the interrelationship between items (expressed as loadings) could be the same. If you force the loadings and thresholds to be equal and unequal across groups in parallel you seem certain to get group differences comparing models even when it is just that one group’s response to items are lower than the other, even when the interrelation between items are the same across groups. Does this make sense or is there a key piece I am misunderstanding?

Thanks for any help you can give.
 Linda K. Muthen posted on Friday, January 20, 2006 - 2:10 pm
Continuous indicators have means and variances that are not dependent. The means and variances of categorical indicators are not independent. This is the basic reason that you cannot generalize analysis with continuous indicators to that of categorical indicators. One way to look at this is to consider the item curves for the categorical items, P(u=1|factor). This curve is influenced by both the threshold and the loading and we are interested in testing whether the whold curve is different. Therefore, the thresholds and loadings should be considered together.
 EOJ posted on Friday, January 27, 2006 - 6:22 am
Thanks Linda, this makes it quite clear.
 Jeremy Miles posted on Thursday, March 23, 2006 - 8:22 am
I am trying to fit a simple model in mplus 4. It looks like this:

MEAN by x1@1 x2@1;
DIFF by x1@-0.5 x2@0.5;

Giving two latent variables, mean and diff, which are, respectively the mean and difference between the two measures (x1 and x2). WIth continuous measures, the variance and mean of x1 and x2 need to be constrained to zero.

What I'd really like to do is do this with categorical (ordinal) measures. I have constrained the thresholds to be the same, and used one as an anchor to identify the latent means, but I am struggling to identify the variance of DIFF when I do.

Am I missing something and being dim, or is there something more fundamental?

(I'm trying this with simulated data, just to see if it can be done in principle.)


 Bengt O. Muthen posted on Thursday, March 23, 2006 - 8:56 am
A variance for a continuous latent variable underlying categorical observed variables can only be identified if you have multiple indicators of the latent variable; here you don't. In the probit framework, the x's in your model are treated at continuous latent response variables with fixed unit variances. So it seems that your latent variable DIFF needs to have its variance fixed at 1.
 Eric O. Johnson posted on Thursday, April 06, 2006 - 10:26 am
I have a question about results of mimic modeling to examine item measurement bias. I have run two models:
1) a mimic model with dummy variable indicators for different groups in the sample with direct paths to the latent variable in order the estimate mean differences relative to the comparison group.

2) the same model as above with the addition of direct paths from the group indicators to the individual items that measure the latent variable.

What I find is that these two models fit the data equally well. Under the second model the mean group differences observed in the first model are substantially reduced and non-significant, but there are some significant direct paths from group indicators to measurement items. Since the two models lead to different conclusions I wanted to be sure that with equal fits, the most parsimonious model (simple mean group difference) is the one to choose even when testing for measurement bias. Is this the case? The examples I've seen in the literature all seem to have both significant mean differences between groups and significant paths to one or more measurement items. Thanks for your help.
 Bengt O. Muthen posted on Thursday, April 06, 2006 - 5:54 pm
I assume that you have only a few direct effects from the group dummies to the items. If you have a significant direct effect, then this is the model you should choose - the model without such a direct effect is misspecified so that the effect of the group dummies on the factors can not be trusted.
 Eric O. Johnson posted on Friday, April 07, 2006 - 7:53 am
Actually there are a fairly high proportion of direct effects from the group dummies to the factor indicators. This is an analysis of the FTND looking at potential differences among three groups relative to a fourth. So there are 6 indicators and 3 group dummy variables. Of the 15 potential direct paths (excluding direct paths to the item chosen to set the scale) 11 show signficant differences relative to the reference group. Does this influence the conclusion that the group mean differences model is misspecified? If there were true group difference in the mean level of a characteristic wouldn't one expect significant differences in the direct paths (that is those with lower mean ND should report lower cigs smoked, etc.)? Thanks again for you help.
 Bengt O. Muthen posted on Friday, April 07, 2006 - 9:31 am
If you have 11 out of 15 possible direct effects significant, you have a problematic model. Remember that direct effects imply measurement non invariance (see e.g. my MIMIC writings), so if you try to interpret effects of group dummies on the factors in a model without these direct effects, you will have very distorted results.

Re you last question, I assume that by "level of a characteristic" you imply the mean of an item. If so, the answer is no. If I understand you correctly, this question indicates an important misunderstanding, so you should read carefully the MIMIC writings on non-invariance. Mean differences in an item across groups is not the issue - the conditional mean difference given the factor is what direct effects concern.
 Eric O. Johnson posted on Friday, April 07, 2006 - 2:44 pm
Well, I meant mean differences in the latent variable, that is nicotine dependence in this case. If a group has a mean level of ND that is lower than another group you would expect differences in their responses to items that measure dependence. Would those differences in response give rise to significant direct paths for the group dummies to the individual items? I will go back and read your papers, but I don't recall a situation in which the group difference on the latent trait was eliminated once direct path(s) to the items were included. Thanks.
 Bengt O. Muthen posted on Friday, April 07, 2006 - 4:04 pm
Group mean differences in the factors do not necessitate the existence of direct effects; these are two different things.

On the other hand, including a direct effect can change a group difference in the factor that was seen when the direct effect was incorrectly not included.

In our annual course, we have an example of the latter.
 Chris G Richardson posted on Thursday, September 21, 2006 - 7:09 pm
Hi Bengt/Linda,
I am working on a measurement invariance analysis using multigroup CFA with continuous outcomes (meanstructure & ml estimation). On page 345 of the User Guide you describe models for successive tests of invariance. In addition to the specifications in the guide I have also fixed one loading per latent to 1 for identification reasons in all steps. I was wondering if I should also fix the intercept of this indicator to zero in any of the tests(e.g. steps 2 or 3). Thanks for your time & for putting on some great workshops - cheers
 Boliang Guo posted on Friday, September 22, 2006 - 4:58 am
read Vandenberg 2000 paper published orgnizational research methods please
 Bengt O. Muthen posted on Sunday, October 01, 2006 - 1:02 pm
Chris - no, the intercept should not be fixed.
 Chris G Richardson posted on Monday, October 30, 2006 - 1:56 pm
Thanks Bengt and Boliang,
At the time of writing my question I only had access to a limited set of papers and they were rather vague about the constraints employed and a mild late night panic set in. Anyways, just to tidy up this discussion I read in Steenkamp & Baumgartner (1998) that two options for model identification are Option 1)in addition to setting the factor loading of one item to 1 in each factor also fix the intercept of this item to ZERO in each group - this equates the means of the latent variables to the means of the marker variables. Option 2) Fix the vector of latent means to zero in the reference group and constrain (atleast) one intercept per factor to be INVARIANT across groups - the item with the invariant intercept should also have an invariant factor loading. The latent means in other countries are then estimated relative to the latent means in the reference country.

Thanks again for your time.

Steenkamp, J. E. M., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer research. Journal of Consumer Research, 25, 78-90.
 Sven D. Klingemann posted on Monday, April 09, 2007 - 9:02 am
I have a question about measurement invariance across groups using dichotomous items. What would be the best way to identify the item(s) that contribute to measurement invariance NOT being met? I was thinking about looking at MI; I am not sure whether there is another way. In addition,
since equality constraints for loadings and thresholds have to be relaxed in tandem, is there any way to point to the specific source of differences between groups on specific items (i.e. is it the loading or the theshold?)?
 Bengt O. Muthen posted on Monday, April 09, 2007 - 6:12 pm
With multiple-group analysis, MIs are useful. You can also search for measurement non-invariance via factor analysis with covariates, although that doesn't discover loading non-invariance - here you can regress each item on all covariates to find significant direct effects.

Once the non-invariant item has been identified, I would use Mplus to plot the item against the factor for each of the 2 groups. It is the difference in these 2 item characteristic curves that describe the non-invariance. The difference due the intercept vs the slope may therefore not be relevant to disentangle.
 Sven D. Klingemann posted on Monday, May 14, 2007 - 2:07 pm
Hi Bengt,
just a quick follow-up:
I have been trying to identify items that are not invariant across groups based on the MI provided. None of the values stand out as being really high - all are below 10. My chi-square test of difference is still significant though. I guess that my sample size (around 18'000)may just make even small differences significant. Is this a case where you just argue that no there are no meaningfull differences in loadings/thresholds across groups despite a significant chi-square test of difference?
Second, if I do relax a loading/threshold for a variable that loads on more than one factor, do I have to relax the equality constraint for all of the factors the item loads on?
Thanks, Sven
 Sven D. Klingemann posted on Tuesday, May 15, 2007 - 8:54 am
Hi Linda or Bengt,
I am testing for measurement invariance across groups and relaxing some of the equality constraints do not lead to the changes in loadings/chi-square that are suggested by the modification indices. Do you have any idea why that could be? Is it because I am relaxing the thresholds and loadings in tandem? Or because of the specifics of the difftest?
Thanks -
 Bengt O. Muthen posted on Tuesday, May 15, 2007 - 9:19 am
Note that the MIs are in the metric of chi-square so that the 5% critical value is 3.84 (because you have 1 df due to considering a single parameter).

Note also the the chi-square difference testing has to be done using the DIFFTEST option.

I can imagine that due to the large sample you can get significant differences that are not very large in terms of parameter values. So this is an empirical assessment of the degree of your over-powering.

You do not have to relax invariance for loadings on other factors.

If you get strange testing results even when taking the above into account, please send to support with the usual info.
 Sven D. Klingemann posted on Tuesday, May 15, 2007 - 9:29 am
Thanks ...
 Jeff Kennedy posted on Wednesday, September 19, 2007 - 7:43 pm
I have 4 samples (from 3 countries) for a 12-item scale. All 12 items load on a single factor, with 6 items also loading on an uncorrelated method (negatively-keyed items) factor. This model fits well in each separate sample. Two samples are large (c. 1000), the other two are smaller (c. 200). My problem is that a 5-point scale was used in 2 samples (a big and a small one) with a 7-point scale used in the other 2 samples.
Q.1: Is it meaningful to test ME for a pair of samples with the same scale, but very different sample sizes? Will the large sample dominate the indications of fit in the combined model?
Q.2: Does the ordered categorical approach allow for testing equivalence where different numbers of scale points have been used? Millsap & Yun-Tein (2004) comment on p. 481 that "a more general description would permit c [largest possible score] to vary across variables, but this extension introduces needless complications for the purpose at hand". This suggests it's possible - are there examples of such studies (or syntax) available? (Some old posts here note that the same indicator needs the same number of categories across groups).
Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515.

Jeff Kennedy
 Linda K. Muthen posted on Thursday, September 20, 2007 - 11:17 am
It is true that when one sample is very large and the other very small that the large sample can dominate.

If the five category wordings are a subset of the seven category wordings, I think you can hold the thresholds of the common categories equal. However, if the item wordings are not the same, I don't think this can be done.
 Markus Peter posted on Wednesday, March 19, 2008 - 1:34 pm
Hi Linda or Bengt,

I am trying to test for invariance of intercepts in a multi-group model. When I check each intercept step by step some models do not converge. I got a hint to subtract the item means from the raw scores/item values (i do not know the english expression for this procedure...). Now all models converge but I am not sure if this is an appropriate procedure since now the intercepts are equal to zero in all groups - thus, it is no wonder that all intercepts are invariant.
Can you clarify if what I have done is adequate?

Thank you,

 Linda K. Muthen posted on Saturday, March 22, 2008 - 9:43 am
When you free the intercepts, you need to fix the factor means to zero in all groups. See the discussion of testing for measurement invariance at the end of the multiple group discussion in Chapter 13 of the Mplus User's Guide.
 Markus Peter posted on Tuesday, March 25, 2008 - 6:38 am
Hi Linda,

I had had fixed the means to zero in all groups when the problem occured. I was able to circumvent it by doing the above mentioned ("center" the variable?). Now everything works fine - I just wanted to make sure that I have not done something inappropriate.
(Perhaps I should mention that I am using MLM.)


 Linda K. Muthen posted on Tuesday, March 25, 2008 - 9:15 am
It is impossible to understand exactly what you are doing without seeing more information. Please send relevant input, data, output, and your license number to
 Kathy posted on Monday, April 07, 2008 - 12:30 pm
Hi everyone. I am a new mplus user and I was wondering if anyone could expand on the multi-group factor analysis section in the user's guide (chapter 13) that states "For the Delta parameterization of weighted least squares estimation, scale factors can also be considered. For the Theta parameterization of weighted least squares estimation, residual variances can also be considered." Not sure how these parameters fit with respect to the two previous steps. Would this be step three in showing invariance? That is, would holding the scale factors and residual variance, for their respective parameterization, be the next nested models? In steps 1 and 2 they describe what needs to be freed and what is fixed, what needs to be freed and fixed if you are testing for invariance of scale factors or residual variance? Thanx.
 Linda K. Muthen posted on Monday, April 07, 2008 - 4:32 pm
These parameters are usually not considered in measurement invariance of categorical outcomes just as residual variances are usually not considered for measurement invariance of continuous outcomes. I suggest reading about these parameters a couple of pages earlier. If you do not have a compelling reason to do so, I would use the models we suggest when testing for measurement invariance.
 Kathy posted on Monday, April 07, 2008 - 7:40 pm
Hey Linda. I have been using Millsap's and Yun-Tein's 2004 article "Assessing Factorial Invariance in Ordered-Categorical Measures" as my template for my analysis and they talk about equality of thresholds, factor loadings, and unique variances to show factorial invariance. I was trying to figure out how to test unique variance.
 Linda K. Muthen posted on Tuesday, April 08, 2008 - 10:08 am
To do that, use PARAMETERIZATION=THETA; in the ANALYSIS command and compare the model with all residual variances fixed to one in all groups to the model with residual variances fixed to one in one group and free in the other groups.
 Kathy posted on Tuesday, April 08, 2008 - 1:28 pm
Sorry Linda, not sure I understand. What happens to the thresholds and factor loadings? That is, the residual variances are already fixed at one in all groups and then fixed at one in one group and free in the others in order to test thresholds and factor loadings invariance. Do I have to run another nested model where the thresholds and factor loadings are fixed to equal across groups and the residual variance are fixed to one in across groups?
 Linda K. Muthen posted on Tuesday, April 08, 2008 - 1:50 pm
The three models would be:

1. Thresholds and factor loadings free across groups; residual variances fixed at one in all groups; factor means fixed at zero in all groups
2. Thresholds and factor loadings constrained to be equal across groups; residual variances fixed at one in one group and free in the others; factor means fixed at zero in one group and free in the others (the Mplus default)
3. Thresholds and factor loadings constrained to be equal across groups; residual variances fixed at one in all groups, factor means fixed at zero in one group and free in the others

You compare model 2 to model 3.
 Kathy posted on Monday, April 14, 2008 - 7:00 am
The analysis worked out great. I did find non-invariance across groups for the thresholds and factor loadings. I wanted to investigate the source of the non-invariance, in terms of which factor(s) are non-invariant. There are three latent variables and the data is categorical. Can you do this in Mplus? Not sure how to proceed, given the constraint requirements in step 2 above for thresholds, factor loading, residuals, and means.
 Bengt O. Muthen posted on Monday, April 14, 2008 - 9:01 am
Request Modification indices and see which items have the largest ones.
 Kathy posted on Monday, April 14, 2008 - 9:44 am
I was thinking in terms of chi-square difference tests. That is, holding the factor loadings (and thresholds) for the first factor invariant and freeing the other two factors' loadings (and thresholds) across groups and comparing this model with the unrestricted model (in step one from Linda's response above). Next, then constraining the first and second factor loadings (and thresholds) invariant across groups and keeping the third factors' loadings (and thresholds) free to be estimated and comparing this model with the one factor constrained model. Can you use this approach in Mplus. If so, can you give an explanation of how to constrain and free the thresholds, factor loading, residuals, and means (see step 2 from above).
 Linda K. Muthen posted on Monday, April 14, 2008 - 10:00 am
You imply in your earlier message that you found non-invariance and want to investigate the source of the non-invariance. To me this would imply that you did difference testing of the models. I am not sure how you decided you found non-invariance otherwise.

Yes, you can do difference testing in Mplus. You would need to use the DIFFTEST option to do this with WLSMV. See the user's guide. Example 5.17 shows how to free factor loadings and thresholds. Residual variances are fixed to one on one group and free in the others as the default. Example 5.17 shows how to fix a residual variance.
 Kathy posted on Monday, April 14, 2008 - 10:38 am
Hi Linda. Yes I used chi-square difference tests to determine non-variance across groups for my three factor model. I am using WLSM estimator. To be more specific, based on your response of April 08, 2008 above would the following be correct to show which of the three factors in my model contributed to the non-invariance finding:

1. To test invariance of the first factor I would constrain the thresholds and factor loadings of the first factor and free thresholds and factor loadings of the other two factors across the groups; residual variances fixed at one in one group and free in the others for the first factor and residual variance restricted to one for the other two factors; factor means fixed at zero in one group and free in the others for the first factor and factor means fixed at zero in all groups for the other two factos.

2. Compare this model to the unrestricted model (step 1 in your response).

3. Repeat this process for each factor.
 Linda K. Muthen posted on Monday, April 14, 2008 - 11:23 am
 Kathy posted on Thursday, April 17, 2008 - 9:30 am
Linda one more question. I found non-invariance for one of my three factors. I am interested in determining which item(s) in the non-variant factor are invariant. I assuming that I follow the same procedure I used to test invariance of the factors (which is step one in my post above on April 14) - correct??? But I am not sure what happens to the factor means. Should they stay fixed to 0 when I am testing individual items or freed?

For example to test for invariance of item one of the non-invariant factor would I constrain the thresholds and factor loadings of the first item and free thresholds and factor loadings of the other 10 items across the groups; residual variances fixed at one in one group and free in the others for the first item and residual variance restricted to one for the other 10 items; at this point I'm not sure what I would do with the factor means. Should it be constrain to 0 or freed for the non-invariant factor?
 Linda K. Muthen posted on Thursday, April 17, 2008 - 1:01 pm
Example 5.17 shows partial measurement invariance.
 Kathy posted on Thursday, April 17, 2008 - 1:46 pm
I don't think the example answers my question. To be more specific, I have a non-invariant factor and I want to test which item(s) in that factor are invariance across groups. For example, if I wanted to see if the first item of the factor is invariant I would have to constrain the first item fixed across groups and allow the other items to be free across groups. The problem is that when you free a factor across groups you have to set the factor mean to 0 for that factor. In my case I only have one factor but I am constraining the first item in that factor invariant and allowing the remaining items of that factor to be free. Given that the same factor has both constrained and freed items across groups, should the factor mean be set to 0 because some of the items are freed?
 Linda K. Muthen posted on Thursday, April 17, 2008 - 3:18 pm
The example shows how to specify partial measurement invariance. You can look at modification indices to see where measurement invariance might be. If you free something you should not, you will be warned. I suggest just starting and seeing what happens. If you have problems with specific set ups, you can send your input, data, output, and license number to
 Kathy posted on Monday, April 21, 2008 - 9:18 am
Just wanted to pass along this example to help explain my question better. I think it will make it clearer.
NAMES ARE sde1-sde3 im1-im3 group;
GROUPING IS group (1=M 3=F);


MODEL: SDE by sde1-sde3;
IM by im1-im3;


! this is the term I am wondering about- not all of the im items are free across groups, im1 is fixed. Is it still appropriate to have IM@0

Model F:

SDE by sde1-sde3;
IM by im2-im3;

! [sde1$1];
! [im2$1];
 Linda K. Muthen posted on Monday, April 21, 2008 - 9:28 am
If you free too many thresholds, you will receive a message about a problem with model identification. I'm not sure exactly how many you can free before you must fix the factor means to zero.
 Kathy posted on Monday, April 21, 2008 - 12:34 pm
In reality I have 20 im items, so I guess I could fix one item at a time to equal across groups and compare these 20 models separately to the unrestricted model in order to determine which of the items are invariant for the non-variant IM factor.
 Linda K. Muthen posted on Monday, April 21, 2008 - 4:51 pm
You can do this.
 John Lawrence posted on Thursday, July 17, 2008 - 3:05 pm

I am attempting to test the measurement invariance of a second-order factor model. The model has four first-order factors. Five or more items load on each factor. I am using the WLSMV estimator. I am attempting to replicate the analysis presented by Chen, Sousa, & West (2005):

1. Configural invariance
2. Invariance of first-order factor loadings
3. Invariance of second-order factor loadings
4. Invariance of intercepts of measured variables
5. Invariance of intercepts of first-order latent factors.

I was able to complete step 1 and 2 above. When I ran step 3 the model ran with no errors however, the program seemed to ignore the command to hold the second-order factor invariant. First-order factors were held invariant but the second-order factor was not held invariant. The output was identical to step 2.

In step 3, in group 1 I defined all the factors including the second order factor, fixed the first and second order factors means to zero, and fixed the scale factors to 1. In group 2, I fixed the first and second order factor loadings to be equal to the first group (by erasing the code), and allowed thresholds to vary between groups. (I allowed the correlations of residuals of two pairs of items to vary across the groups.)

So, how do I need to change the code to hold the second-order factor invariant?

Yours truly,

John Lawrence
 Linda K. Muthen posted on Friday, July 18, 2008 - 4:32 pm
Please send your input, data, output, and license number to

I assume you have categorical factor indicators given that you are using WLSMV. Note that the models to compare for measurement invariance differ from those of the continuous case. See the end of the multiple group discussion in Chapter 13 of the Mplus User's Guide.
 Thomas F. Northrup posted on Wednesday, October 01, 2008 - 2:14 pm

I have a question in reference to Linda's 2nd post from 4/8/2008 in this thread. I am testing residual variances invariance, across gender, in a two-factor ordered categorical measure, and I am unable to generate the chi-square difference test from model 2 to model 3 when I fix the factor means to zero for one group and allow them to be estimated freely in the 2nd group. However, when I fix the factor means to zero in both groups and run models 2 and 3, the chi-square difference test runs successfully. How, if at all, does constraining the factor means (in both models 2 and 3) affect the interpretation of the residual variances invariance (via the chi-square difference test)? Thanks in advance.
 Linda K. Muthen posted on Thursday, October 02, 2008 - 9:59 am
You would need to send the two outputs and your license number to for me to comment.
 Thomas F. Northrup posted on Tuesday, October 07, 2008 - 1:45 pm
My apologies, my previous post was confusing. Here is the needed clarification: I cannot generate the standard errors for the model parameters when I allow the factor means to be fixed to zero in group 1 and free in group 2. Thus, I cannot get the chi-square model test to generate for model 2 when the factor means are free in the second group. I believe the problem with identifying the model is due to high multi-collinearity in one of the factors. Therefore, I am looking for a proper way to fix the problem. Since the chi-squared difference test is estimated when I fix the factor means in both groups (for both models 2 and 3), I wanted to learn, from a conceptual standpoint, how might constraining the factor means (in both models 2 and 3) affect the residual variances invariance interpretation?

 Linda K. Muthen posted on Tuesday, October 07, 2008 - 3:20 pm
It is not possible to say without more information. You would need to send the two outputs and your license number to for me to comment.
 Eivind Ystrom posted on Friday, May 15, 2009 - 1:47 am

When doing scalar invariance testing across groups using WLSMV to test if it is apropriate to compare latent means, how should one specify the latent mean when thresholds are fixed (scalar invariance model).

It seems that since in the metric incvariance model, one have to fix the latent mean to reach identification. When running the scalar invariance model, the model seems to be too strict. Since the chi-square difference test between WLSMV models state that a scalar invariant model with free means is not nested within the metric invariant model with fixed means. How then to compare the metric invariant and scalar invariant model?

Furthermore. To reach identify the metric invariant model one have to fix the scalars. Should the scalars be fixed in both the metric and scalar invariant model?

Best regards
 Linda K. Muthen posted on Friday, May 15, 2009 - 6:03 am
See page 399 of the user's guide where the models for testing measurement invariance for categorical outcomes are described.
 Ulrich Schroeders posted on Wednesday, February 03, 2010 - 2:46 pm
Dear Drs. Muthén,

I have two questions concerning the two models on p. 399 (delta para.):
1) Wouldn't it be necessary to hold scale factors equal across groups (and fixing the means to zero) in order to achieve strong measurement invariance? Thus, does model 2 represent "residual variance invariance"?
2) I wonder if you would use the term "configural invariance" for model 1 taking into account that scale factors are functions of residual variances, loadings, and factor variances?

Thanks in advance, kind regards,
 Linda K. Muthen posted on Thursday, February 04, 2010 - 8:23 am
1. Scale factors are not residual variances. If you want to test residual variances, use the Theta parameterization.
2. This sounds reasonable.
 Ulrich Schroeders posted on Thursday, February 04, 2010 - 9:29 am
Dear Dr. Muthén,

Thanks for your quick response. A short follow-up question: Am I correct that Delta parameterization is for estimation purpose only? That is, the Theta parameterization is preferable because results are much more easy to interpret (i.e., no dependencies between scale factors and residual variances, loadings, and factor variances).

Thanks in advance, kind regards,
 Linda K. Muthen posted on Friday, February 05, 2010 - 9:12 am
Delta is the default. We recommend using it unless Theta must be used. Although Theta may be easier to interpret, it has problems in some cases. See Web Note 4 for more information.
 Gert Vanthournout posted on Wednesday, September 01, 2010 - 5:21 am
Dear dr. Muthén,

I am trying to verify the measurement invariance of a scale prior to latent growth modelling. The scale is estimated with 4 likert items and we have 3 time points.
Constraining the factor loadings works fine. However, when constraining the tresholds, the modification indices point towards a major problem with the factors (in stead of the tresholds). Though these values decrease when tresholds are de-constrained (estimated freely), they remain the largest modification indices.

[ F1 ] 46.455 0.195 0.339 0.339

[ F2 ] 0.097 -0.008 -0.018 -0.018

[ F3 ] 35.594 -0.157 -0.302 -0.302

How should I interpret this finding?
 Linda K. Muthen posted on Wednesday, September 01, 2010 - 4:31 pm
From the modification indices, it looks like you have factor means fixed to zero at all time points. This is too restrictive. The factor means need to be zero at one time point and free at the others. See the Topic 4 course handout under multiple indicator growth to see how to test for measurement invariance across time.
 Tracy K Witte posted on Friday, July 01, 2011 - 12:19 pm
I had a question regarding scalar invariance. I am interested in comparing 2 clinical groups, which we expect to have different levels of the latent means. My understanding is that in order to compare latent means, one has to establish configural, metric, and scalar invariance. We found configural and metric invariance, but not scalar invariance. My initial interpretation of this is that the items are biased, such that individuals with the same level of the latent variable manifest different observed scores, if they are in different groups. However, I just read the Vandenberg (2000) article, which states that if you expect groups to differ in terms of their latent score, it's not appropriate to test for scalar invariance "because differences in item location parameters would be fully expected," (p. 36) I'm confused about how to proceed - if you expect differences between groups, how can you demonstrate that those differences are actually real, if a test of scalar invariance is inappropriate?
 Bengt O. Muthen posted on Saturday, July 02, 2011 - 8:19 am
I don't have that article. As you present it the statement does not make sense because group differences in factor means does not imply differences in measurement intercepts (or threshold, difficulty). I don't know why the confusing terms metric and scalar (strong) invariance are used instead of the straightforward terms loading and intercept invariance, but translated it is clear that intercept and loading invariance is needed to study factor mean differences. Intercept invariance is often rejected, partly because of the stronger power to reject that than loading invariance.

Note also that you don't need full intercept invariance here - there may be only a few items that show non-invariance and you can free those. You find them via Modindices.
 Tracy K Witte posted on Saturday, July 02, 2011 - 11:06 am
Thank you so much! In the Vandenberg et al. (2000, Organizational Research Methods) article, they state that if one has a "measure that is a valid operationalization of the construct and the hypothesis regarding group differences is true, then the items underlying that measure should also reflect group differences if mean difference tests were conducted on an item-by-item basis. Hence, a test for intercept invariance is not appropriate because differences in item location parameters would be fully expected. However, these differences are not biases in the sense of being undesirable as in rating source biases, but rather they reflect expected group differences." Am I misinterpreting what they're saying here, by thinking it means that one shouldn't test for intercept invariance if one expects group differences? Or is this perhaps an error in the Vandenberg article?

I guess I was wondering if it would be possible to find invariant intercepts if one group has a consistently higher score on a questionnaire than another group. I did look for partial invariance, but even after freeing many intercepts, it still was not achieved. However, I do have full loading invariance. I really appreciate your advice on this!
 Bengt O. Muthen posted on Saturday, July 02, 2011 - 12:17 pm
It is perfectly fine to find invariant intercepts if one group has consistently higher observed scores than another group. This is because the observed score mean m_g in group g,

m_g = nu + lambda*alpha_g,

where nu is the invariant intercept, lambda the invariant loading, and alpha_g the group-varying factor mean. So, variations in alpha_g is what causes m_g to vary.

I think it is hard to argue that you are measuring the same factor construct if the majority of the items don't have both intercept and loading invariance.
 Kathy posted on Saturday, October 15, 2011 - 2:02 pm
Received an interesting error/warning message. I ran a MGCFA (WLSMV) and found non-invariance for the loadings/thresholds i.e. significant DIFFTEST comparing the measurement invariance and non-invariance models. The MI indicated that the loading/threshold for one item was above the 3.84 level. Therefore, I added a group-specific model command to the measurement invariance model which freed the one loading/threshold across groups (residual variance for this item was @1 for both groups). I ran a DIFFTEST comparing this model with the non-invariance model and I received this message:THE MODEL ESTIMATION TERMINATED NORMALLY THE CHI-SQUARE DIFFERENCE TEST COULD NOT BE COMPUTED BECAUSE THE H0 MODEL MAY NOT BE NESTED IN THE H1 MODEL. DECREASING THE CONVERGENCE OPTION MAY RESOLVE THIS PROBLEM.

To me the models above, i.e. the partial invariance and non-invariance models, should be nested. How do you decrease the convergence option so that I can get a result for my DIFFTEST?
 Linda K. Muthen posted on Saturday, October 15, 2011 - 4:00 pm
Please send the two outputs and your license number to
 Kathy posted on Friday, October 21, 2011 - 12:25 pm
Hopefully I can figure it out myself, but I don't see anywhere in the user's guide how to decrease the convergence. Is there an option I am not seeing? How would you decrease the convergence?
 Linda K. Muthen posted on Friday, October 21, 2011 - 2:25 pm
See the CONVERGENCE option in the user's guide.
 Cecil Meeusen posted on Tuesday, November 15, 2011 - 8:05 am
Dear dr. Muthén,

I am doing research about the growth curve of ethnocentrism.
Before starting the analysis, I need to know if my factor of ethnocentrism (measured by six indicators at three points in time) measures the same at my three time points.
I have no intention of going further into the analysis of measurement invariance. The only thing I need to know is if the factor loadings and intercepts of my indicators are the same across the three time periods.
- Factor loadings of corresponding items equal across time period
- No covariance between residuals of each indicator with indicator that does not corresponds
- Covariance between residuals of corresponding indicators
- Variances of factors are equal across the three time periods)

Which restrictions do I have to impose?
How do I model this in Mplus?

Thank you very much!
 Linda K. Muthen posted on Tuesday, November 15, 2011 - 9:36 am
Please see the Topic 4 course handout on the website under multiple indicator growth. The inputs for this are shown there.
 Cecil Meeusen posted on Wednesday, November 16, 2011 - 9:15 am
Thank you!

Why is it not necessary to specify that the items of time 1 can correlate with the items of time 2 (and time 3)?
Why do you not compare the factorial invariance model to the configural model?
How is the last one specified in Mplus for multiple indicator growth curves?
Are there any tricks to make factorial invariance model fit?

Thank you!
 Bengt O. Muthen posted on Wednesday, November 16, 2011 - 3:29 pm
Q1. It can be important to explore the need for correlated items over time.

Q2. I don't know that the configural model is helpful if your aim is to study change over time; you need metric invariance for that.

Q3. A configural model can be specified in a longitudinal factor model setting by not holding factor loadings equal over time. You obviously need more than one factor for configural to be relevant.

Q4. Good measurement and good pilot work on earlier samples using EFA.
 Marloes Vleeschouwer posted on Wednesday, January 25, 2012 - 5:16 am
for conducting a multigroup measurement invariance analysis for categorical data I want to use from both groups the correlation matrix and intercepts as input data.

How are the correlation matrixes and intercepts of both groups displayed properly in a datfile for mplus to read it as input data of two seperate groups?

Many thanks!
 Linda K. Muthen posted on Wednesday, January 25, 2012 - 6:47 am
See pages 431-432 of the user's guide.
 Sarah Ryan posted on Wednesday, February 15, 2012 - 1:00 pm
After establishing measurement invariance, I have been testing structural invariance, including invariance of latent means.

Given that I have covariates in the model, I get estimated latent means in the output. The estimated latent means differ, as one might expect, depending on if the model is estimated separately for each group and if the structural invariance baseline model is estimated for both groups simultaneously.

When I report the estimated latent means in my results, I'm thinking it would be more appropriate to report the means estimated in the structural invariance baseline model since the results from this model are more akin to comparing "apples and apples." Am I thinking about this correctly?
 Sam Smith posted on Thursday, February 16, 2012 - 5:48 am
I am testing for partial measurement invariance in a model with 1 factor with 4 dichotomous indicators (u1-u4) in 2 groups (g1, g2). I am using WLSMV. I first estimated the measurement non-invariance model, and the full measurement invariance model. In the first partial measurement invariance model I freed the loading and thresholds for the u4 item. That worked out fine, but if I go on and try to free the loading and thresholds for an additional item- u3, Mplus complains that this new model is not nested in the measurement noninvariance model (I am using DIFFTEST to compute the chi-square difference). The 2 models I'm comparing are:
!Model 1
f1 by u1-u4;
Model g2:
f1 by u2-u4;

!Model 4
f1 by u1-u4;
Model g2:
f1 by u3 u4;
[u3$1 u4$1];
{u3@1 u4@1};
Model 4 gives me parameter estimates, but if I use DIFFTEST to compare the 2 models, I get the message: THE CHI-SQUARE DIFFERENCE TEST COULD NOT BE COMPUTED BECAUSE THE H0 MODEL IS NOT NESTED IN THE H1 MODEL. Is it not possible, when the indicators are dichotomous, to test a partial measurement invariance model in which only one item, in addition to the marker item is invariant?
Thank you!
 Linda K. Muthen posted on Thursday, February 16, 2012 - 3:07 pm

Please send the relevant outputs and your license number to

Please note that in a model with covariates, an intercept not a mean is estimated for the dependent variable.
 Linda K. Muthen posted on Friday, February 17, 2012 - 2:11 pm

This should work. Please send the relevant outputs and your license number to Include TECH1 and TECH5 in the OUTPUT command.
 Tyler Hunt posted on Sunday, March 04, 2012 - 9:11 am
I have a 10 item scale that I am trying to establish measurement invariance across male and female subjects. The ultimate goal is to compare latent means for differences. First, I looked at form, then factor loadings, then intercepts. Everything worked out and did not produce a significant chi square or change the other fit indices until I got to equating intercepts. It seems that the latent means are also equated by default. It seems strange to me to equate the very thing I am expecting to be different in order to meet the requirements to compare them. Once I freely estimate the latent means for the second group then the chi square is not significantly different from the equated factor model. Is this a problem because I fixed and freed parameters? I tried to freely estimate the latent means for both groups starting with equal forms but then the mean structure is under identified. Having an 8 degree of freedom change between equating loadings and equating intercepts is likely to raise eyebrows.
 Linda K. Muthen posted on Sunday, March 04, 2012 - 9:58 am
Latent variable means are by default zero in one group and free in the others. The test of latent variable means is this model versus a model where they are zero in all groups. See the Topic 1 course handout under multiple group analysis.
 Tyler Hunt posted on Sunday, March 04, 2012 - 12:02 pm
I was taught that models are nested if you free or free parameters. Models are not nested if you do both. Is this just not applicable in intercept invariance? When freely estimating indicator intercepts I had to fix the latent means to zero to get it to run.
 Linda K. Muthen posted on Sunday, March 04, 2012 - 6:33 pm
When intercepts are free, factor means must be fixed at zero in all groups for model identification. See the Topic 1 course handout and video under multiple group analysis for a thorough discussion of measurement invariance and population heterogeneity.
 Sarah Ryan posted on Friday, March 09, 2012 - 3:26 pm
Referring to the comparison of probit coefficients across groups, Williams (2009) states:
"So, in logit and probit models, coefficients are inherently standardized...the standardization is accomplished by scaling the variables and residuals so that the residual variances are either one (as in probit) or π2/3 (as in logit). If residual variances differ across groups, the standardization will also differ, making comparisons of coefficients across groups inappropriate."

Does this statement apply in the context of SEM? I am using WLSMV (delta) and so my understanding is that it is not possible to constrain res. var. to equality. I would suspect they are not, although I have achieved all other criteria for structural invariance. Thus, I am not sure how to address the appropriateness of comparing parameters across groups in my work (but suspect I will get a reviewer comment on the issue). Any advice?
 Linda K. Muthen posted on Friday, March 09, 2012 - 4:14 pm
With weighted least squares and the Delta parametrization, scale factors can be free in all groups except one where they are fixed at one. With weighted least squares and the Theta parametrization, residual variances can be free in all groups except one where they are fixed at one. This alleviates the problem mentioned above. See Web Note 4 on the website for a discussion of this topic.
 Marloes Vleeschouwer posted on Wednesday, May 23, 2012 - 5:13 am
In a Measurement invariance Analysis I found a violation of metric invariance, a non-equivalence in factor loadings between two groups.
now I want to know how strong the non- equivalence in factor loadings is between the two groups,
Is there a coefficient or measure for the strenght of the metric variance?

many thanks,

 Linda K. Muthen posted on Wednesday, May 23, 2012 - 12:13 pm
I think some look at the ad hoc coefficient of congruence. You can Google this. But really this is a matter of substantive interpretation much like practical significance.
 Ting-Lan Ma posted on Monday, October 01, 2012 - 12:45 pm
Hi Dr. Muthen,

I am trying to compare the factor means for four factors across two groups.

I checked the measurement invariance and found partial measurement invariance across my two groups. Out of 27 items across the four factors, there are four items that do not have invariant factor loadings (this four items belong to 2 separate factors). And 11 items do not have invariant intercepts.

My question is: what is the best way to compare the mean differences for a partial measurement invariance model?

Originally I try using MIMIC model; however, under mimic model, it does not allow me to unconstrain factor loadings and intercepts of certain items. In this case, I may only use MIMIC model with items shown fully metric and scalar invariance (which leaves me only 15 items across 4 factors).

However, I would really like to compare the factor means across two groups using the 27 items under the circumstance where some items' factor loading and some items' intercepts are set unconstrained. Is there a way in Mplus for me to approach this question?

(I wasn't considering about mean structure analysis because I thought it also requires fully measurement invariance for all items)

Thank you so much for your help!
 Linda K. Muthen posted on Monday, October 01, 2012 - 1:02 pm
See the Topic 1 course handout and video on the website under Multiple Group Analysis. It shows a partial measurement invariance model and how to test for means across groups.
 Ting-Lan Ma posted on Monday, October 01, 2012 - 6:25 pm
Hi Dr. Muthen,

I ran analyses according to the Topic 1 course handout. However, when I proceed from "partial measurement invariance model with invariant factor variances and covariances (model A)" to "partial measurement invariance model with invariant factor variances, covariances, and means (model B)", the degree of freedom of model B stays the same as the model A, which shouldn't be the case.

I have four factors, therefore it should yield df difference of 4 in this case (because I constrained the four factor mean to be equal in model B following the input on p223 in Topic 1 handout).

Do you know where I should locate the problem? Thanks!
 Linda K. Muthen posted on Tuesday, October 02, 2012 - 10:33 am
Please send the two outputs and your license number to
 Ting-Lan Ma posted on Wednesday, October 03, 2012 - 1:42 pm
Hi Dr. Muthen,

Thank you for identifying the error within my output. I have a further question. My results showed that constrained the factor mean to be equal result in a worsen model fit. Then how do I identify which factor mean is significant different in one group versus the other?

In the topic 1 handout you stop the slides when you conclude that constrained factor mean to be equal result in a worsen fit, but did not go on to test where different factor means lie. How should I test the significance in the difference of factor means in this case?

Thank you very much!
 Linda K. Muthen posted on Thursday, October 04, 2012 - 10:03 am
When you run the model with factor means at zero in all groups, ask for modification indices and that will help you identify mean differences across groups.
 steve posted on Monday, February 11, 2013 - 2:09 am
Dear Drs.

I have the same problem as reported by John Lawrence 06/2008. I am testing measurement invariance of a second-order model following the approach provided by Chen and colleagues (2005).

Unfortunately, I can't hold the second-order factor invariant. The program seems to ignore the command and provide the same results when testing invariance of first-order loadings. I appreciate any help. Below you will find the syntax as follows:







F4 by F1 F2 F3;
[F1@0 F2@0 F3@0 F4@0];



With kind regards
 steve posted on Monday, February 11, 2013 - 4:09 am
Oh.. I apologize for posting in the wrong forum. My questions concerns continuous factor indicators.
 Linda K. Muthen posted on Monday, February 11, 2013 - 6:50 am
A second-order factor with three indicators is just identified. Therefore, model fit is the same for both models. Model fit cannot be assessed for a just-identified model.
 Michelle Little posted on Tuesday, February 12, 2013 - 6:28 am

I have a question about MPLUS recommendations for invariance testing using theta parameterization and ordinal categorical variables.

It is recommended that item uniquenesses be constrained to unity in both groups to identify loadings and thresholds. It is then suggested that uniquenesses be freed when constraining loadings and thresholds.

Why wouldn't we keep the item residuals constrained in the test of loadings/thresholds so that we can be sure that the change in the chi-square relates directly to loading/threshold invariance? Is there any problem with keeping the item residuals constrained in a second model and then proceeding with a check on strict invariance by releasing the item residuals?

Thanks in advance for your help.
 steve posted on Tuesday, February 12, 2013 - 6:38 am
Thank you. I placed an equality constraint on the variance of f1 and f2 in order to address this issue (f1 f2 (1);) which, however, did not work. Is there any meaningful way testing for invariance of first- and second-order loadings with only three first-order indicators?
 Linda K. Muthen posted on Tuesday, February 12, 2013 - 1:09 pm

I don't know of any meaningful way to test a just-identified model unless some other constrains make sense.
 Jens Jirschitzka posted on Wednesday, May 15, 2013 - 4:35 am
Dear Mplus Team,

(1) This question is about partial measurement invariance in models with categorical outcomes and with more than 2 groups (e.g. 3 groups: A, B, C and assuming a full invariance baseline model with reference group A). If MOD.indices say that in group C the loading and/or the thresholds of item i seem to be non-invariant and assuming that’s true: Would it be correct to free the thresholds and loading for item i only in group C [WLSMV and Delta/Theta: additionally fix the scale factor / the residual variance at one in group C]? Or is the correct way to do this not only in group C but also in group B and reference group A (here the scale factors/resid. var. are already set to one) even if the loadings/thresholds for item i are equal in group A and B? The finally goal is to compare the factor means between the groups.

(2) It’s not a problem in single factor models if categorical outcomes x (dichotomous) and y (polytomous) have different numbers of categories as long as for each variable the number of categories is the same across groups?

(3) I know, there are options in ML/MLR [e.g. (*) ] if there are empty (unused) categories in at least one of the groups. But in testing partial measurement invariance models I can’t set free a threshold for a group if there at least one related category is empty, that's right?

Thank you very much.
 Linda K. Muthen posted on Wednesday, May 15, 2013 - 8:47 am
1. Yes, only free in group C.
2. Yes.
3. You will receive a message if this is the case.
 Jessica Memarzia posted on Thursday, June 27, 2013 - 4:34 am
I am testing for longitudinal invariance with categorical data, using DIFFTEST between unconstrained and constrained (metric and scalar) models. Firstly, for the free model:

processors= 4 ;
parameterization = theta ;
estimator = wlsmv ;
f1 by t1_1c* t1_2c t1_3c t1_4c t1_8c ;
f3 by t3_1c* ...etc for t3 items;

f2 by t1_2c* t1_5c t1_6c t1_7c ;
f4 by t3_2c* ...etc;
savedata: DIFFTEST IS FREE_MOD.dat ;

And then to test for metric invariance:

processors= 4;
parameterization = theta ;
estimator = wlsmv ;
f1 by
t1_1c* (1)
t1_2c (2)
t1_3c (3)
t1_4c (4)
t1_8c (5) ;

f3 by
t3_1c* (1)
...etc ;

[f3@0] ;
[f1@0] ;

t1_1c pwith t3_1c ;

f2 by
t1_2c* (16)
t1_5c (17)
t1_6c (18)
t1_7c (19) ;

f4 by
t3_2c* (16)

[f2@0] ;
[f4@0] ;

t1_2c pwith t3_2c ;

 Linda K. Muthen posted on Thursday, June 27, 2013 - 7:20 am
Please send the two outputs and your license number to
 Danyel A.Vargas posted on Thursday, September 26, 2013 - 3:49 pm
I am trying to run a longitudinal invariance model with four waves of data and 6 indicators per scale. I am now estimating the strong invariance test but am receiving an error. I have looked at the output but cannot figure it out. Can you please help. This is the error I'm receiving.


 Bengt O. Muthen posted on Thursday, September 26, 2013 - 3:55 pm
Tech1 shows you which parameter number 81 is. You can then check with the UG pages 681-682, describing the parameterization for growth modeling with measurement invariance.
 Rebecca Kamody posted on Friday, December 13, 2013 - 12:32 pm
I have been running multigroup measurement invariance analyses (using ordinal variables with three thresholds, Weighted Least Squares, and Delta Parameterization) via the convenience commands from version 7.11. My understanding, based on the Mplus User’s Guide and Millsap (2011), is that A) for Metric, the first threshold of each item is held equal across groups, and the second threshold of the item that is used to set the metric of the factor is held equal across groups; and for B) Scalar all thresholds are held equal across groups. In my model, the chi-squares and degrees of freedom do not change from Metric to Scalar. When I review the Tech1 output, the Tau matrix is the same for both Metric and Scalar Models. Should all the thresholds be held equal across groups with the Scalar convenience command, or am I misinterpreting which thresholds in particular need to be constrained? One additional point is that I have provided start values for my model, but it appears that the constraints imposed in the configural and metric models are correct.
 Bengt O. Muthen posted on Friday, December 13, 2013 - 4:50 pm
Please send the two relevant outputs to Support.
 dvl posted on Friday, January 31, 2014 - 3:17 am
Dear Professor,

If my model is tested for metric and scalar invariance and both assumptions are supported, I work further with the default options in MPLUS assuming that factor loadings and intercepts are equal across groups? Or should I use the specific factor loadings and intercepts for each specific group in the subsequent steps?
 Linda K. Muthen posted on Friday, January 31, 2014 - 9:36 am
If you have established invariance of intercepts and factor loadings, you should hold them equal in further analyses. These equalities represent measurement invariance.
 Kathleen posted on Wednesday, February 12, 2014 - 8:55 pm
Is there a best way to examine partial measurement invariance –specifically DIF--with respect to time? I compared a LTA with full invariance to one with full non-invariance, and using the loglikelihood ratio test with scaling correction factors, found the variant item thresholds fit the data better. How can I examine which item function differently over the 2 time points? Modindices can't be used and I have not found much on this in the webnotes, the UG or on this discussion boards.
Despite the better fit of the non-invariant model, I may impose invariance for substantive reasons, and proceed with a mover-stayer model, as my observed response patterns fit that model well, but I would like to understand the cause of the non-invariance.
Thank you.
 Linda K. Muthen posted on Thursday, February 13, 2014 - 11:18 am
Why can't you get modification indices?
 Kathleen posted on Thursday, February 13, 2014 - 9:27 pm
Hi and thank you for responding to my post, Dr. Muthen. I thought since I wanted to compare the threshold invariance over time, I asserted MODINDICES in the LTA model, but the error message this is not an option for mixture models with more than one categorical variable. I'm using type= mixture complex, if that matters. How can I compare thresholds over time if not in the LTA? Thank you again.
 Bengt O. Muthen posted on Friday, February 14, 2014 - 12:04 pm
You may have to constraint one item at a time to have equal measurement parameters across time and do a series of LR - chi-square tests.
 Pawel Grygiel posted on Friday, February 14, 2014 - 5:37 pm
Dear Mplus team,
I have a little problem while trying to check longitudinal invariance between two times (20 items, 3-point ordered categorical observed variables, WLSMV estimator with theta parameterization). In metric invariance all factor loadings & first threshold of each item were constrained equal between times - except the reference item which had both thresholds constrained.
Question: If in metric invariance model one of the factor loadings is not invariant, then in scalar model SECOND thresholds of this item should be NOT constrained equal across time?
 Linda K. Muthen posted on Sunday, February 16, 2014 - 11:22 am
See the Version 7.1 Mplus Language Addendum on the website. Starting at page 8 we describe testing of measurement invariance for ordinal outcomes across groups. The same principles apply across time.
 dvl posted on Wednesday, March 12, 2014 - 8:53 am
Dear Professor,

In mplus the default setting when performing a multiple group confirmatory factor analysis is that factor loadings and intercepts are equal across groups. However, I only found metric and no scalar invariance in my two-factor model. Now the question is the next one: Should I free the equality constraints (defaults) on the intercepts when moving from the measurement to the structural part of my model in this specific case? I think I should, but I would rather be a 100% sure. I'm not completely sure whether the structural part of my model requires metric and scaler invariance...

Thanks a lot for your help!
 Linda K. Muthen posted on Wednesday, March 12, 2014 - 11:36 am
You can free the intercepts as long as you are not comparing factor means.
 Lydia Brown posted on Sunday, March 23, 2014 - 10:31 pm
I am doing an ESEM measurement invariance analysis across two groups using WLSMV. I originally followed your instructions based on Chapter 14 of the users guide to test configural then scalar invariance. However, in your Mplus Version 7.1 addendum, you offer the option of testing metric invariance by fixing some of the thresholds for identification purposes("The first threshold of each item is
held equal across groups. The second threshold of the item that is
used to set the metric of the factor is held equal across groups." I am wondering if this new metric option is appropriate for my model?
 Linda K. Muthen posted on Monday, March 24, 2014 - 6:10 am
No, it is not. The METRIC setting
is not allowed for ordered categorical (ordinal) variables when a
factor indicator loads on more than
one factor, when the metric of
the factors is set by fixing the factor variance to one, and when
Exploratory Structural Equation Modeling (ESEM) is used.
 RuoShui posted on Friday, April 04, 2014 - 11:31 am
Dear Dr. Muthen,

I have a question of measurement invariance with ordinal indicators. When testing measurement invariance with multiple ordinal indicators, metric invariance involves constraining some thresholds in addition to factor loadings. Is it correct that as a result, it is not as easy to achieve metric invariance as in the case with continuous indicators. If partial metric invariance can be achieved through relaxing a couple of threshold, is it justified to proceed to testing scalar invariance?

Thank you very much!
 Bengt O. Muthen posted on Sunday, April 06, 2014 - 5:40 pm
The threshold constraints used with metric invariance for ordinal variables make the model identified, but does not imposed extra restrictions. So I don't think you can relax "a couple of thresholds". Fur further details, see the Millsap book.
 Yoonjeong Kang posted on Friday, May 09, 2014 - 2:12 pm
Dear Dr.Muthen,
I tested approximate measurement invariance twice using one data set. For fist analysis, I used Do DIFF function and for second analysis, I specified the prior distributions and model constraint statement to test measurement invariance.
F1 by X1-X4*(LAM1_1-LAM1_4);
F1 by X1-X4*(LAM2_1-LAM2_4);
DO (1,4) DIFF(LAM1_#-LAM2_#)~N(0, 0.001);
DO (1,4) DIFF(INT1_#-INT2_#)~N(0, 0.001);
- model: same as analysis 1-
LAM1_1-LAM2_1 ~N(0, 0.001);
LAM1_2-LAM2_2 ~N(0, 0.001);
INT1_3-INT2_3 ~N(0, 0.001);
INT1_4-INT2_4 ~N(0, 0.001);

I got...
Analysis1: PSR =1.00 at 50000 iter& PPP=.696
Analysis2: PSR=1.23 at 50000 iter & PPP=0.00

I think codes for two analyses are equivalent, but if I am wrong, could you let me know? I would like to know why this happened.
Thanks a lot in advance!
 Linda K. Muthen posted on Saturday, May 10, 2014 - 11:19 am
These are not the same. In one case, you test the difference between lambdas. In the other case, you test if the lambdas are close to zero.
 Yoonjeong Kang posted on Saturday, May 10, 2014 - 12:26 pm
Dear Dr. Muthen,

I am sorry but I couldn't understand. In both analysis, I tried to test the difference.

For the first analysis, I used do diff function to test the measurement invariance for the lambdas and intercepts.

What I did in the second analysis, I created the new parameters for the differences of the lambdas and intercepts in MODEL CONSTRAINT and I specified the prior distributions to the differences between parameters.

The motivation for the second analysis is based on Muthen and Asparouhov (2013) regarding BSEM measurement invariance. According to the technical report, " With only two groups/time points, the difference relative to the average can be augmented by the difference across the
two groups which can be expressed in MODEL CONSTRAINT." Based on this statement, I used MODEL CONSTRAINT and MODE PRIOR for testing BSEM measurement invariance in the second analysis.

So, I thought that both analyses were the same. If I am wrong, could you explain it to me again?

Thanks a lot for your response in advance!
 Bengt O. Muthen posted on Monday, May 12, 2014 - 4:31 pm
For your second analysis I don't see any priors given in what you show. Note also that priors cannot be given for NEW parameters in Model Constraint. All you are saying is that

LAM1_1-LAM2_1 ~N(0, 0.001);

which means that you believe the loadings are small.
 Yoonjeong Kang posted on Monday, May 12, 2014 - 8:45 pm
Dear Dr. Muthen,

I thought that this statement indicated a small DIFFERENCE between LAM1_1 and LAM2_1 because I put "minus" sign between two parameters. So you mean that the statement under model priors, "LAM1_1-LAM2_1 ~N(0, 0.001);" indicates that the loadings (e.g., LAM1_1, LAM2_1) are close to zero. Thanks for correcting me.

Then, is there any way to assign the prior distribution to difference between two parameters in Mplus? How can I tell Mplus?

 Bengt O. Muthen posted on Tuesday, May 13, 2014 - 9:03 am

LAM1_1-LAM2_1 ~N(0, 0.001);

is giving a "list" of parameters all of which has that prior. So the dash implies a list, not a minus sign. For differences you have to use DIFF.
 Linda Lin posted on Tuesday, July 15, 2014 - 3:24 pm
I am running multiple group comparison for a higher-order SEM. Observed variables are both ordinal and continuous. I tired to use WLSMV+Parameterization=theta, but it can not converge. MLR+Algorithm=integration is not available for multigroup comparison with categorical data. Knownclass+Type=Mixture is not able to measure the variance of categorical data. Any other options I can use for testing my model? Thank you!
 Bengt O. Muthen posted on Wednesday, July 16, 2014 - 11:16 am
I think you should explore why WLSMV didn't work. First check if the default Delta method works. If not, send to support with your license number.
 Katherine Keenan posted on Thursday, October 23, 2014 - 7:51 am
Dear Dr. Muthen,

I am a new user so please excuse the basic nature of the question. I need to some help please on how to explore measurement non-invariance and what to do about it in my final model.

I already established measurement non- invariance for a latent variable constructed from 4 categorical indicators across 4 groups of countries.

childsep by roompp indexaccom1 books parentoccnew;

I have established that metric and scalar non-invariance exists by using the multigroup command and

model is configural metric scalar;

I am unsure what to do next. I started freeing the factor loadings and thresholds one by one and compared to a null model. They all were significantly different, so I couldn't identify one particular problem variable.

If all the factor loadings are non- invariant, what can I do next? Fit the latent variable separately for each country group?

I would appreciate any guidance and example code on this issue.

Many thanks
 Linda K. Muthen posted on Thursday, October 23, 2014 - 10:37 am
As a first step in multiple group modeling, you should fit the measurement model in each group separately. You may want to start with an EFA to determine that the same number of factors is appropriate for each group. You can then move to the CFA. If the same CFA does not fit well in each group, going forward to multiple group analysis does not make sense.
 Bo Zhang posted on Tuesday, March 03, 2015 - 3:19 am
Dear Dr. Muthen,

I am conducting measurement invariance tests for a two factor model. Indicators of the first factor are continuous while indicators of the second factor are dichotomous. I have never seen models like this. What procedure should I follow? The common procedure for continuous indicators or the procedure for categorical indicators? Also, do you know any references that resolve such topics?
 Bengt O. Muthen posted on Tuesday, March 03, 2015 - 12:10 pm
Use a combination of procedures - continuous for continuous items and binary for binary items.
 Heather Clark posted on Wednesday, March 25, 2015 - 8:16 pm
Dear Dr. Muthen, I have attempted to specify a model where the factor loadings are restricted similar to the example in slide 212 of topic 1. I specify a model where I put the observations into [ ] brackets. However, I recieve an error in the output that tells me the model statements with the [observations] are ignored. Woud you be able to tell me where I went wrong?
The specifics are below:

f1 by pwarm pwb pmar;
f2 by acadp acap acaeng;
[f1@0 f2@0];

Model age10: [pwarm pwb pmar acadp acap acaeng];

Model age14: [pwarm pwb pmar acadp acap acaeng];

The following MODEL statements are ignored:
* Statements in Group AGE10:
[ PWB ]
[ PMAR ]
[ ACAP ]
* Statements in Group AGE14:
[ PWB ]
[ PMAR ]
[ ACAP ]
 Bengt O. Muthen posted on Thursday, March 26, 2015 - 1:29 pm
Perhaps you have categorical variables in which case the intercept specification isn't right - use thresholds.

If this doesn't help, send to support.
 Lois Downey posted on Friday, May 22, 2015 - 10:10 am
I would like to test scalar invariance (loadings and thresholds) for three groups on a 5-indicator construct. Each indicator is measured with 4 ordered categories.

However, for one of the groups, one indicator has no cases in the third response category.

In this case, I believe I cannot do the test for scalar invariance via the "model=scalar" option of the ANALYSIS command. However, I'm wondering whether there is a way to do the test by setting it up manually. Or is a test for scalar invariance impossible under these circumstances?

 Bengt O. Muthen posted on Friday, May 22, 2015 - 5:51 pm
I think you can do it manually using the (*) option on page 544 of hte UG.
 Lois Downey posted on Saturday, May 23, 2015 - 8:18 am
Thanks very much. And is the (*) option the procedure you would recommend, no matter where the empty category falls?

I've (perhaps inappropriately) used a different method for establishing threshold equivalences in tests of scalar invariance with "linked" data -- e.g., where each record includes responses from a patient and from the patient's family member. I've set these up with two latent variables, one measured with indicators from the patient and one with indicators from the family member, and have manually established the threshold equivalences. If the patient data for an indicator includes responses in all categories, but the family data includes no response in category 2, I've used the following code:

[p3$1 f3$1] (10);
[p3$3 f3$2] (11);
[p3$4 f3$3] (12);
...and so on

That method seemed to me to preserve the "meaning" of the categories over the two respondent groups (although I'm not clear whether one should omit the 1st or the 2nd patient threshold from the equivalences in this example).

Does using the (*) option have the effect of, in the above case, reinterpreting the family response 3 to mean the same thing as patient response 2, and so on down the line? If so, do you think this is preferable to the method I've used with my linked data?

Thank you again for the assistance.
 Bengt O. Muthen posted on Saturday, May 23, 2015 - 10:18 am
Q1. Yes.

Q2. I would think it is a better approach.
 Lois Downey posted on Sunday, May 24, 2015 - 9:58 am
OK. Thank you. Back to my original analysis problem, then, where I'm testing measurement invariance for INDEPENDENT groups: when I use the (*) option on the categorical command, I am informed, "For estimators WLS, WLSM, WLSMV and ULSMV, the possible categories for each categorical variable can only be determined from the data for that variable. Use estimator ML, MLF, or MLR for the special categorical recoding features."

However, the maximum likelihood estimators appear not to be available for multiple group analysis, and -- if I remember correctly -- they wouldn't give me a test of model fit anyway.

So it appears that the (*) option doesn't work in this case. If I recode the offending variable manually with a DEFINE command to remove the empty category, I get the warning that there is a sample correlation of 1.000 between the offending variable and one of the other (unrecoded) indicators, as a result of one or more zero cells in the bivariate table. However, the model estimation terminates normally. Is it OK to use the test of fit from that model, despite the warning message. Or is it preferable to recode ALL of the indicators in the same way as I recoded the one with the empty cell?

Thanks very much!
 Bengt O. Muthen posted on Monday, May 25, 2015 - 3:23 pm
Yes, ML needs to be used with (*). Although you don't have an overall fit measure with ML, you can work with nested model and do chi-square difference testing. You can also work with bivariate fit statistics in TECH10.
 Bina Knöpfli posted on Friday, October 23, 2015 - 6:09 am
Dear Dr. Muthen

I would like to investigate measurement invariance of two groups in a 3-factor model (all items are categorical). The difficulty is that we have a factor with only two highly correlated items (0.83). The problem is the most notable in the configural model, as the unstandardized factor loadings of the two items are around 18 and not significant in both groups. Furthermore, the Modification Indices for both items WITH the factor could not be computed.

When regarding the Metric as well as Scalar models however, the unstandardized factor loadings and the M.I. are acceptable.

Is there a way to deal with the multicollinearity of these two items, so that we can still test the measurement invariance of this factor?

Thank you very much for your time and help!
 Linda K. Muthen posted on Friday, October 23, 2015 - 5:00 pm
Please send the output and your license number to
 Elena Jansen posted on Wednesday, March 09, 2016 - 4:47 pm
Dear Dr. Muthen,
I am trying to investigate measurement invariance across two groups in a 7-factor model (3-6 items per factor, all items are categorical). I am using the MODEL = CONFIGURAL METRIC SCALAR shortcut. While the model works in both groups separately and for the configural and scalar level, I am getting an error message for the metric model (non convergence). I have tried to increase the number of iterations and also switched to THETA parameterization. This doesn’t help either. I have identified one of the factors that is likely problematic. When examined individually it runs using the THETA parameterization, however I also receive the following message “THE CHI-SQUARE COMPUTATION COULD NOT BE COMPLETED BECAUSE OF A SINGULAR MATRIX”.
I am reluctant to immediately discard this factor, given that it shows invariance on the scalar level. Do you have any recommendation about how to manage the computation issue for the metric invariance level?
Thank you very much in advance for your time and help!
 Bengt O. Muthen posted on Wednesday, March 09, 2016 - 5:16 pm
I would ignore the metric model because the scalar model fits. Just report non-convergence. The metric model is a bit quirky for categorical outcomes. It isn't identified for binary variables and for ordinal variables the identifying restrictions are sufficient but all of them aren't always necessary.
 Elena Jansen posted on Wednesday, March 09, 2016 - 5:44 pm
Thank you very much for your response.
 Maksim Rudnev posted on Wednesday, July 05, 2017 - 4:48 am
Dear Bengt/Linda,

I am testing scalar invariance with ordered categorical responses and three factors. For this, I made all thresholds equal across groups and fixed latent means in the first group to zero. However, I want to avoid fixing latent means to zero (the reason is that I have a second-order factor which I also test for scalar invariance, and it seems impossible when some intercepts, which are first-order means, are fixed to zero in one group). In continuous case it would be possible by fixing one intercept per factor to 0.

Is there any way to get model with ordered categorical responses identified without having to fix latent means to zero? For example, can I fix thresholds to some constant to freely estimate latent means in all the groups?
 Bengt O. Muthen posted on Wednesday, July 05, 2017 - 6:12 pm
Yes, you can fix say the first threshold and thereby free a factor mean.
 ma abd posted on Friday, July 28, 2017 - 8:45 pm
Dear Dr. Muthen
I'm trying to conduct a configural, metric, and scalar ME. I have 6 factors with 20 variables and I recoded one of the variables but I got this messages:
Error messages for the Configural Model:
Parameter 76, Group MALE: MR WITH CI

Error messages for the Metric Model:
Error messages for the Scalar Model:
Parameter 101, Group FEMALE: MR WITH CI

Could you help me to fix these problems?
 Bengt O. Muthen posted on Sunday, July 30, 2017 - 5:11 pm
Send your output to Support along with your license number.
 Lois Downey posted on Monday, September 18, 2017 - 5:59 pm
I want to test for between-group measurement invariance of a construct (Trust) measured with two reflective indicators (Trust3 and Trust5) and two causal indicators (Trust1r and Trust4). All four indicators are measured on a Likert scale ranging from 0 to 4, but to prevent empty cells in the cross-tabulation of the reflective indicators by group, I included a CUT statement that collapsed values of 0 and 1 on these two variables.

I believe the test for measurement invariance requires equal loadings and thresholds across groups for the reflective indicators, and equal regression coefficients for the causal indicators, so I included the following statements:

Trust by Trust3 Trust5;
Trust on Trust1r (1)
Trust4 (2);

MODEL RacialEthnicMinority:
Trust on Trust1r (1)
Trust4 (2);

The resulting model has the expected between-group equality on loadings, thresholds, and regression coefficients. However, the model is inadmissible because of a non-positive-definite residual covariance matrix, with the variable Trust5 having a negative residual variance for the WhiteNonhispanic group.

Is there a modification to the model that might correct this problem, or must I simply report that an improper result was obtained?

 Bengt O. Muthen posted on Tuesday, September 19, 2017 - 6:06 pm
This question is suitable for SEMNET.
 Theres Ackermann posted on Wednesday, May 09, 2018 - 12:56 am
Dear Drs Muthen,

I would like to investigate measurement invariance but in the second of my two groups some categories for 2 variables weren't chosen by the respondents. To solve this problem I used the GROUPING-approach in combination with (*), which did not work so I switched to the KNOWNCLASS-approach, but now I get the error:

*** ERROR in MODEL command
No OVERALL or class label for the following MODEL statement(s): GR BY G_1* G_2 G_3

My syntax is: ...


GR BY G_1* G_2 G_3
G_4 G_5 G_6 G_7 G_8;
C BY G_1* G_3 G_5 G_6;
P BY G_2* G_4 G_7 G_8;
GR@1; C@1; P@1;

I also tried using %OVERALL% in front of the first line in the model command but this gave strange results and I think I did something wrong. Could you help me to understand the problem? Also, is there a way to test metric invariance for bi-factor models?
Thanks a lot!
 Bengt O. Muthen posted on Thursday, May 10, 2018 - 2:58 pm
Send the output with strange results to Support along with your license number.

You can set up a test of any invariance you want - but this particular one can't be done automatically.
 Po-Yi Chen posted on Thursday, May 24, 2018 - 8:44 pm
Dear Dr. Muthen and Dr. Asparouhov:

I get a question about the number of parameters in scalar invariance models estimated by WLSMV & cat-ML.

I conducted a invariance test across gender with WLSMV as:

categorical are v1-v10;! ten 5 point scales, no missing data
grouping = group (0 = a 1 = b);
f1 by v1 v2-v10;
MODEL = configural scalar;

This syntax works fine and it shows the scalar invariance model has “62” parameters (configural has 100, df of ChisqDiff = 38)

However, when I try to refit these models with the cat-ML to address a reviewer's comment as:

categorical are v1-v10;
CLASSES = cg (2);
KNOWNCLASS = cg (group = 0 group = 1);
f1 by v1 v2-v10;
MODEL = configural scalar;

The number of parameters in my scalar invariance model become “52” (configural still has 100 & df of ChisqDiff become 48).Given I have a impression these two estimators should provide df on Chisq diff test, I wonder how could I revise the synatx to make df of chisqDiff of these two estimators be identical?

Thank you,
Best wishes,
Po-Yi Chen
 Bengt O. Muthen posted on Friday, May 25, 2018 - 12:56 pm
WLSMV is able to use a slightly more general/flexible model than ML. You would get the ML number of parameters if in WLSMV you use Parameterization=Theta and fix the residual variances at 1 in both groups.
 Po-Yi Chen posted on Monday, May 28, 2018 - 10:02 pm
Dear Dr.Muthen,

Thank you! Your suggestions perfectly solve my question about the # of parameters; while I get two more questions about the loading estimates after changing the parameterization for my invariance models from detla to theta

Q1. I notice that when I change the parameterization in my configural model (latent fatct variance fixed at 1) from delta to theta, the unstandardized loading estimates from the WLSMV also become more close to the loadings obtained from cat-ML (with probit link). Thus, I wonder would that be correct for me to say that the model Mplus used from cat-ML estimation is the same model of WLSMV only with theta but not delta?

Q2. I also notice that even the unstandardized loading estimates of obtained from cat-ML & WLSMV-delta are quite different, their standardized (stdyx) estimates are almost identical. Thus, I wonder will that be correct for me to say that in Mplus, the loadings obtained from theta & delta paramterization are comparable (or on the same scale) to each other after stdyx?

Thank you
 Bengt O. Muthen posted on Tuesday, May 29, 2018 - 5:39 pm
Q1: Yes, except ML uses logit as the default while WLSMV uses probit. You can ask for probit in ML by saying link=probit in the Analysis command.

Q2: Yes.
 Alexia Carrizales  posted on Wednesday, May 30, 2018 - 6:01 am
Dear Professors,
I testing the measurament invariance categorical of a scale of 11 items (N=1630)
Concerning the results : X2/DF = 286.605 (102) p<.0001, CFI .987, RMSEA .047, WRMR =1.49. I'm a bit confuse. Some chi-square tests were significant, the p-value associated to scalar invariance is significant:difference chi = 64.18, P<.001. However, no drops of in delta CFI= -.001 or deltaRMSEA=. 007 , exceeded the thresholds of -.010 or -.015, respectively
Should I test for partial scalar or because this two criterias are met should I ignore the chi- square significant value ?
Thanks for your help
 Bengt O. Muthen posted on Wednesday, May 30, 2018 - 2:47 pm
This general analysis question is suitable for SEMNET.
 Snigdha Dutta posted on Monday, December 03, 2018 - 9:02 am
how does one make decisions on what mod indices to look at in each stage - configural/metric/scalar?

I understand it's based on theory but that's the second stage of decision making. Which output of mod indices should one focus on?

This is related to categorical longitudinal measurement invariance
 Bengt O. Muthen posted on Monday, December 03, 2018 - 4:06 pm
You may want to ask this general analysis strategy question on SEMNET.
 Weng-Fong Chao posted on Wednesday, January 29, 2020 - 11:17 am
Hi Dr. professors,

I have questions about the measurement invariance of ordered-categorical outcomes. I have a two-factor model with 5 ordered-categorical indicators for each factor (total of 10 items). Each of these indicators is on a 4-point Likert scale. And I totally have four groups. I have read Millsap & Yun-Tein (2004) and try to use their method to test measurement invariance.

1. Should I do the following steps like these for the baseline model (based on p.485-487)?
(i) use theta paramater, (ii) fix one latent group mean to zero, (iii) fix all the intercepts to zero (default in Mplus), (iv) fix the loading of the maker indicator to 1 for across all groups, (v) constraint one threshold for each indicator to be equal across groups and one additional threshold to be invariant for the marker indicator, (vi) fix all the error variance to one for the reference group but free the error variances for the remaining groups.
 Weng-Fong Chao posted on Wednesday, January 29, 2020 - 11:20 am
Continue for the previous post

2. Millsap & Yun-Tein (2004) also mentioned that for the case of ordered-categorical measures, it is not the same as the continuous case and the ordered-categorical CFA does not have a direct connection between the CFA model and observed means + covariance structure. This CFA model also relies on the assumption of a multivariate normal distribution. Does it mean having loading and threshold invariance is not sufficient enough for latent mean comparison?

3. Since I don't see studies to discuss the impact of the non-invariance of unique variance in the case of ordered-categorical measures. How important is it for the invariance of unique variance? Is it a necessary condition for latent mean comparison in the case of ordered-categorical measures?
 Bengt O. Muthen posted on Thursday, January 30, 2020 - 5:32 pm
A good paper on this is the 2016 article in Psychometrika by Wu & Estabrook. They have a section on Mplus input for ordered categorical outcomes.
 Samuli Helle posted on Tuesday, April 07, 2020 - 11:46 am
I am using Wu & Estabrook's (2016) approach to check for measurement invariance of a 2-factor model with ordinal indicators. The configural model doesn't fit too well and modification indices in all my 4 groups suggest strong improvement by adding two residual covariances (that do make sense since the indicators share conceptual meaning and are closely worded). However, when I add these residual covariances into the model (between the phantom variables denoting y*), Mplus gives me a warning because of the >1 correlations between these variables having the residual covariances. Adding such residual covariances is not a problem when using default MPlus specification for my model. Any suggestions what's wrong and how I might fix it?
 Bengt O. Muthen posted on Tuesday, April 07, 2020 - 3:00 pm
I would check if the configural model has the same number of parameters in the Mplus default setup as compared to that of W-E. If that doesn't help, we would need to see the full outputs - send to Support along with your license number.
 Bengt O. Muthen posted on Sunday, April 12, 2020 - 5:55 pm
Note that the Wu-Estabrook Table 4 shows the same number of parameters for Condition 8 (the Mplus default configural model) and their MYT model.
 Samuli Helle posted on Wednesday, April 15, 2020 - 1:25 pm
Would using theta instead of delta parametrization help to fit those residual covariances?
 Bengt O. Muthen posted on Wednesday, April 15, 2020 - 4:05 pm
Either one is fine. I am looking over the Svetina et al paper to see what you are running into.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message