Message/Author 

Anonymous posted on Thursday, February 03, 2000  11:58 am



Some people have suggested that latent growth modeling and multigroup comparison of latent means should be based on completely invariant measures of the latent constructs. In others words, factor loadings should be equivalent across time and groups. Variant factor loadings just reflect changes and differences. In situations where the focus of research is to find out relationships of the latent variables but make up unbiased tests, should we bother to seek invariant measures? Dr. Muthen's vocie has not been heard since his earlier publication regarding a similar issue! 

Leigh Roeger posted on Wednesday, February 09, 2000  4:53 pm



Probably up untill a couple of days ago I would have also thought that measurement invariance was more a concern for comparing means than for correlational type research. I have just read an excellant paper by Horn and McAardle (1992) Experimental Aging Research vol 18 n3 p117144 which among other things raises this issue They provide a very clear explanation (p118) why this isnt the case. They demonstate quite convincing that 'for interetations of correlations the same attribute must be measured in each group'. It really is a good paper for better understanding the importance of measurement invariance. 


Just so I know I am answering the right thing, can Anonymous please clarify the statement: "In situations where the focus of research is to find out relationships of the latent variables but make up unbiased tests, should we bother to seek invariant measures?" Is there a "not" missing in this sentence? 

Anonymous posted on Wednesday, February 23, 2000  10:45 am



Anyway, I would like to know situations where partially invariant factor loadings, in the sense of week measurement invariance, suffice to yield reliable parameters for comparisons, if possible. Should I do a study without the need to refer to previous studies, I would seek invariant measures. When there are a bunch of studies to "discuss", the more reliable results but inconsistent with previous ones might be hard to be accepted by peers. It may be interesting to start a "pen fight". I am also having trouble checking the invariance of factor loadings across groups, with all the indicators being categorical. Some variables do not have similar categories across groups. Collapsing some categories to make them consistent across groups seem to alther the expected results that can be obtained without collapsing. Should I do a multiple single group analyses and bypass the issue of invariance? In addition, I would like to have some guidelines to check invariance of factor loadings with categorical indicators. 


I think once you can't claim total measurement invariance of thresholds and factor loadings, it depends on whether the invariance has a substantive reason. Of course, this is always open to discussion. I would hope some more substantive readers of Mplus discussion would comment. For example, it would be reasonable to assume that you measure the same construct if you have ten items of which 8 are invariant and the 2 that are not invariant have a good substantive interpretation for their noninvariance. But it would be questionable to assume that you measure the same construct if 2 items were invariant and the remaining 8 not. When there are a different number of categories for the same indicator in different groups, it makes it impossible to hold the thresholds equal because there are not the same number of thresholds. This cannot be done in Mplus. Mplus requires the same number of categories in each group. Collapsing categories should not make a substantial difference in the model unless the model was illfitting to begin with. I believe you should do multiple single group analyses without collapsing categories, then with collapsing categories, then a multiple group with collapsed categories. The guidelines would be the same as for continuous indicators. First run the model without invariance, then with factor loading invariance, then with both factor loading and threshold invariance. In all cases, use WLS to get chisquare difference tests. The invariance should not significantly worsen the fit. If it does at any one step, look at the derivatives to see where the misfit might be and modify the model. 


Having thought about the issue of measurement invariance from perhaps a more substantive oreintation than a statistical one can I take up Linda's request and make a couple of observations. The first is that the actual importance of measurement invariance (or lack of) will quite likely vary depending on how large the initial difference between the groups (say boys and girls) was to begin with. So for example in my MIMIC girls rated their mums as more caring than boys. There are several items showing DIF but nothing you did altered this basic very strong finding. On the other hand a factor relating to behavioural freedon the latent mean difference between boys and girls is small and you can make this difference significant or not by allowing some items to vary between boys and girls. I chose a sig level of 0.05 to decide whether to let factor loadings and thresholds vary but I wonder whether whether a sig test is really right for this decision. I ran 100 such tests  and am I really saying 0.051 is not biased (because its not sig)  is this test independent of sample size  and how robust is it all when the model fit is by no means good. I think some kind of judgement needs to enter this but you would have a hard time convicing anyone that you didnt just play until you got the answer that you wanted! The second point is that its possible in any scale the DIF might function both ways and in effect cancel the bias out. So in Linda's example if the two out of ten items are going in opposite directions (one favouring one group and the other the other group) and are of the same magnitude by letting these vary the latent mean shouldnt change at all. In fact this is what I found  a lot of items showing DIF but it didnt really cause much bias in the total score because they were pulling in opposite directions willy nilly without any clear pattern. So a scale full of biased items doesnt necessarily mean the total test score is biased  not perfect thats for sure but maybe not that bad either for what your trying to do? This is in no way to argue that one shouldnt worry about DIF  because you won't know whats going on until you run the analysis. I look forward to the other views of researchers who are out there grappling with these issues. 


Good points. For an example where a single item's DIF made a big difference, see Gallo, J.J., Anthony, J. & Muthen, B. (1994). Age differences in the symptoms of depression: a latent trait analysis. Journals of Gerontology: Psychological Sciences, 49, 251264. (#52). Also, items with DIF don't have to be thrown out if the model itself includes parameters that allow for the DIF (direct effects in MIMIC, noninvariance measurement parameters in multiplegroup analysis). 


Two other articles also provide some hints on how to deal with the problems of the first discussant. Muthen, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, Vol. 54, No. 4, 557585. Pentz, M. A., & Chou, C.P. (1994). Measurement invariance in longitudinal clinical research assuming change from development and intervention. Journal of Consulting and Clinical Psychology. Vol. 62, No. 3, 450462 


Can I ask for an opinion about the issue of scalar invariance. In the literature many researchers seem to assume that if a scale shows configural (same pattern of zero and nonzero factor loadings across groups) invariance and also metric invariance (same factor loadings) then this shows that observed mean differences can be meaningfully interpreted. Others come along and say that when latent means are being compared scalar invariance (equal intercepts) is also required. This makes sense but my question is whether it is necessary to achieve scalar invariance for the comparison not of latent means but observed means. If it is (as I think it is) then there is a lot of misunderstanding out there. Any comments from anyone? 


I was wondering if it would be possible to post some clarification on the last comment in Bengt's 3/2 posting? Wouldn't noninvariant measurement parameters run counter to Meredith's 'strong factorial invariance' that is necessary to make valid comparisons across groups on latent variable means? I think this point is also relevant to Leigh's recent post. 

bmuthen posted on Wednesday, October 11, 2000  2:28 pm



To answer Leigh and Randy, I think one needs invariant measurement intercepts in addition to invariant slopes to be able to compare observed means  if this invariance is not present, two people with the same observed value have different factor values (and are therefore different). As for latent means, my thinking is that one can compare them also under only partial measurement invariance as long as one allows for meaningful parameters that pick up the noninvariance. 


Background: I'm supposed to examine the influence of cultural background (two nationalities), gender, age (a continuous variable), and their twoway interactions on children's number sense. Number sense is assessed with a test comprising of 40 binary variables. Due to the design and limited sample size (330 participants altogether) I intended to use a MIMICmodel, where the two categorical variables, one continuous variable, and their interactions predict the latent structure of children's number sense. Since I'm also supposed to compare three alternative latent structures (1factor, 2factor, and 7factor solutions), I have a situation where I first need to test the competing models and their invariance across the background variables. To do this, I ran a set of MIMICmodels with alternative factor solutions for each background variable at a time. My assumption was that if the measurements in relation to different background variables are invariant, the model fit should be good, given that the factor solution is appropriate. The presence of possible DIF should be indicated by high derivative values. The fit stats for these sets of models were as follows (WLSMV was used): Nationality as the covariate: (1 factor model): p=.000, CFI=.952, TLI=.976, RMSEA=.047; (2factor model); p=.000, CFI=.965, TLI=.983, RMSEA=.040; (7factor model): p=.0004, CFI=.971, .985, .037. (Two items seemed to flag for DIF). Gender as the covariate: (1f): p=.000, CFI=.968, TLI=.985, RMSEA=.040; (2fs): p=.0012, CFI=.976, TLI=.989, RMSEA=.035; (7fs): p=.0080, CFI=.982, TLI=.991, RMSEA=.031. (No DIF). Age as a covariate: (1f): p=.000, CFI=.932, TLI=.953, RMSEA=.045; (2fs): p=.0001, CFI=.949, TLI=.964, RMSEA=.040; (7fs): p=.0010, CFI=.959, TLI=.971, RMSEA=.036. (One item seemed to flag for DIF). A main effect model with all predictors: (1f): p=.000, CFI=.933, TLI=.948, RMSEA=.040; (2fs): p=.0005, CFI=.947, TLI=.959, RMSEA=.036; (7fs): p=.0035, CFI=.958, TLI=.967, RMSEA=.032. A model with all twoway interactions included: (1f): p=.000, CFI=.920, TLI=.935, RMSEA=.039; (2fs): p=.0008, CFI=.938, TLI=.950, RMSEA=.035; (7fs): p=.0047, CFI=.951, TLI=.959, RMSEA=.031. My questions: (1) Is this procedure valid for (a) testing for invariance across the different groups, and (b) for simultaneously comparing the alternative factor solutions? (2) Given the fit stats, what should be concluded about the alternative models? I cannot use the likelihood ratio test with WLSMV, so I should compare the change in fit indices. However, I only know of one simulation study (Cheung & Rensvold, 2002) that provides some guidelines for this, but that study used MLestimation, so it isn't directly applicable. I would definitely say, that sufficient measurement invariance exists, but given that even the 1factor solution show moderate fit, do I have grounds to argue for the 7factor solution? Then again, is the 7factor solution really any better than the 2factor solution (especially considering the complexity it adds to the results)? (3) Am I missing something relevant? Sorry for the lengthy message, but I thought it would be better to provide all the necessary information. Thanks! 


I would not approach the problem as you have. I would first decide on the best factor model without including covariates. I would then move to a MIMIC model to see which covariates might be related to measruement noninvariance. I would then use these covariates in a multiple group analysis. MIMIC can only see invariance in the intercepts/threhsolds. It cannot see invariance in the factor loadings. 


Thanks! Few additional questions: Since the Ns in my subsamples are rather low, I was hoping to avoid multigroup comparison altogether. Now, if I followed the procedure you suggested, would you say that, given the level of fit obtained with MIMIC models (cf. my previous message), there is any need to proceed to multigroup comparisons in the first place? Also, can I use derivative values with continuous covariates just as with categorical covariates? 


When you say "flagged for dif", what do you mean? You can use derivatives for continuous or categorical covariates. It is the scale of your outcome variable that decides whether modification indices are available or you need to use derivatives, that is, unscaled modification indices. 


By "flagged for dif" I mean that certain derivatives were high (btw, how can one decide what values are 'high' except for testing the influence of adding the direct effect from a covariate to the observed variables?) But with WLSMV there is no MIs available other than derivatives, right? 


You can't decide which are high without doing a run. There are no derivatives for WLS, WLSM, or WLSMV. 


Sorry, but what do you mean by "there are no derivatives for WLS, WLSM, or WLSMV"? 


I mean no modification indices. Sorry I had a very long day yesterday. 


I bring this question up here because it is motivated by my recent reading of Prof Muthen's Webnote 4 discussion of the Latent Response Variable Model. That note states that the LRV model presumes a causal relationship not just conditional probability. This directional causal aspect of the model is my understanding of FA in general. Two papers on measurement equivalence (Raju, Laffitte, and Byrne 2002, Applied Psychology; and Flowers, Raju and Oshima, Paper at 2002 NCME) question whether tau differences on an item between two groups resulting from a CFA analysis is necessarily DIF. They point out that the difference could be due to impact, differences in ability. The basis for this suggestion appears to be based on the standard model for a person's observed score for an item i (subscripts omitted) x = tau +(lambda)(ksi) + delta and thus the expected value of x, MUx is MUx = tau + (lambda)(MUksi) It is stated that algebraically tau = MUksi  (lambda)(MUx) It is observed that "when the lambdas are equal, the difference in taus (intercepts) will simply reflect the difference in the means of x and ksi across populations." (Flowers et al). Raju et al state that (the tau values in the tau equations for each group) “depend on item and factor means." I can understand that if certain values in the algebraic formula are known then a remaining value is implied by the formula. However, given a causal view of the CFA model I have trouble seeing how tau is caused by or dependent on ksi. If the formula above is taken as describing the causes of tau, then MUx is a cause of tau. But that seems to turn the causal model idea on its head. Speaking of heads, this is probably way over my head. But that is why I am asking. Using this logic couldn't I also solve for lambda and then say that lambda can be caused, in part, by differences in ksi across groups and thus lambdas are also ambiguous? Also, in IRT, DIF is usually viewed as any difference in the ICC's for two groups for an item. Threshold differences don't seem, I think, to be viewed as ambiguous. In fact, I thought an advantage of IRT was that equatable (if there is no DIF) threshold estimates did not depend on the ability of the group. In the CFA approach both groups are on the same scale so the equating shouldn’t be needed? I’m probably a little confused. 

bmuthen posted on Thursday, May 29, 2003  6:04 pm



I haven't read those papers, but I assume that they might be talking about a situation where if you ignore the group differences in the ability means, you can mistakenly get the impression that the item has different thresholds for 2 groups (bias or DIF). If, however, you allow for ability differences across the groups in your analysis, no DIF will be discovered. When we do DIF analysis using Mplus and the "MIMIC approach", that is having the grouping variable as a covariate, you are allowing for group difference in ability. Same for the multiplegroup approach. 


I can see that if you constrain ability factor means across groups to be the same, then things can get confounded. But it doesn’t seem to me that this is what these articles are doing. The Flowers et al paper entitled “ A comparison of measurement equivalence methods based on confirmatory factor analysis and item response theory” uses simulation of two group linear CFA for its analysis. They simulate population groups both with and without ability factor differences. They don’t appear to constrain the factor mean estimates to be the same across groups. The Raju et al article entitled “Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory” also doesn’t appear to constrain factor means. I don’t know, but if a tau difference between groups (given free factor means) isn’t an indication of DIF, then that seems to me to be a significant point. But I haven’t seen that point made anywhere else. I will keep pondering. It just seems to me that if in the population (forget estimation) students, male and female, have the same ability level but have different expected item performance (due to different tau’s for the groups) then this should be called DIF. I know this isn’t strictly an Mplus issue, but Mplus has a unique intersection with IRT. It is interesting to note that the Raju et al article highlights as a major IRT/CFA difference the nonlinear vs linear models (with no reference to Mplus as an option for CFA). 

bmuthen posted on Saturday, May 31, 2003  2:29 pm



Could you send me those 2 papers so I can take a look at this? 


I just sent you an email with the two PDF files of the articles attached. Thank you for your interest and time. I may just be misreading someting. 

bmuthen posted on Saturday, June 14, 2003  10:39 am



I have now read the Raju et al article that you sent me. In my view, the article's discussion of this issue on page 520 is confused. I am surprised that this mistaken description got past the reviewers. First, equation 18 is formulated as if the intercept is a function of the item mean and the factor mean. This is looking at the situation backwards. Equation 16 is the correct "causal" view: the factor mean and the intercept produce the item mean. The factor mean describes a property of the individuals and the intercept a property of the item. Together they produce the item mean. Second, the Dorans and Holland quote is misunderstood. Say that z is the group dummy variable. In my view, the quote implies that "impact" is the effect of z on x mediated by the factor, while DIF is the direct effect on the item x. I.e. DIF represents a difference on x between the groups when they are considered at the same level of the factor  so a conditional item difference given the factor. In contrast, the article's paragraph below the quote gives a confused view, e.g. talking about a "current uncertainty" about intercepts and therefore focusing on loadings. There is no uncertainty in this area. See e.g IRTrelated references to Muthen on the Mplus web site. In conclusion, this aspect of the article should be ignored in my view. I am sorry that it was also confusing the other researchers that you mention. 


Thank you. Your help is greatly appreciated. 

Anonymous posted on Friday, November 21, 2003  11:11 am



I would like you to clarify me the several types of factorial invariance: configuration, metric and strong. Thanks. 

bmuthen posted on Monday, November 24, 2003  11:02 am



I am not familiar with the terms, but they sound like the correspond to  having the fixed zero loadings in the same places  having measurement intercepts and slopes invariant  having measurement intercepts, slopes, and residual variances invariant More later on this. 

Anonymous posted on Thursday, November 27, 2003  9:07 am



Thank you very much. I will be waiting for more. 

Anonymous posted on Tuesday, December 02, 2003  4:38 am



I saw this last message about the several types of factorial invariance. Tell me... slopes and intercepts aren't "features" that only make part of the latent mean structures? 


I'm afraid we don't understand your question. Can you please clarify? 

Anonymous posted on Monday, December 15, 2003  10:27 am



I would like that you clarify what are slopes and intercepts, and if these "concepts" make part of all strutural models or just in latent mean structures models. 


In the regression of y on x, y = a + bx + e where a is an intercept, b is a slope, and e is an error term. In SEM, if means are not included in the analysis, no intercepts are estimated. Only a slope is estimated in this situation. I hope this answers your question. If not, please clarify. 


I have a data set of 6028 (boys: 2912 Girls: 3116) who completed a 20 item scale (4 categories). The mean score for boys is 10.91 (SD 6.98) The mean score for girls is 13.30 (SD 8.53) Using WLS I estimate a multiple group model assuming full metric and scalar invariance (equal factor loadings and equal threaholds). Boys are group one and girls are group two. The variance of the boy latent is fixed at one (to put things in z score metric). As expected the boy latent mean is zero and the girl latent mean is higher 0.303. After a series of tests I have concluded that several factor loadings and thresholds are not invariant across gender. I produce a final model allowing several factor loadings and thresholds to vary across gender. I reestimate the latent mean and the find the girl latent mean has dropped to 0.279. This is as expected because the direction of the bias is to increase scores for girls. My question is is it appropriate or possible to translate this change back to raw scores. I would like to say that the approximate bias impact on raw scores in the present data is around 10% ((100/279)*303). The raw difference in total scores is 2.50 and so a reduction of 10% to this equates to 0.250 or a quater of a point. How does this logic sound  is there a neater way. Thanks for any advice! 


What you suggest makes sense, but it would be a rough approximation given that the sum of the factor indicators may not be a good representation of the factor. Also, I'm a little confused about how you will get the sum on the same metric as the factor. 


I have a MIMIC model in which varibles form indeces for several categories that have an effect on two latent constructs. Now I want to compare groups with respect to the paths leading from these categories to two latent variables. First checking for configural invariance (i.e. the sae variables form the same category across groups) does not pose a prolem (with AMOS). But how to check for metric invariance? Most articles I read dealt with that problem for models where variables were reflective indicators. Does someone know an article that deals with my problem? Thanks in advance! 


I assume that you are looking for an article that discusses measruement invariance for formative indicators. I am not aware of any article that deals with this topic. Perhaps this is a question for the broader SEMNET group. 

Anonymous posted on Saturday, January 08, 2005  11:41 am



Does anyone know of good references describing the nuts and bolts of conducting and intrepreting the output of a CFA with categorical indicators? Thanks 


Anonymous, Try this article Millsap, R. E., & YunTein, J. (2004). Assessing Factorial Invariance in OrderedCategorical Measures. Multivariate Behavioral Research, 39(3), 479515. Scott 

EOJ posted on Thursday, December 15, 2005  6:54 am



There hasn't been much action on this thread but it seems like the right subject for my question. I am trying to deal with the question of measurement invariance across multiple groups. I first approached this in a traditional way. I ran multigroup models with difftest comparisons and found metric variance when using the Mplus default of fixing the thresholds for the categorical indicators. Once I allowed the thresholds to be freely estimated, fixed the scales to 1 and the means to be equal across groups there was no longer a significant chi square difference between groups. Being uncertain whether this meant that the groups differed in the level of the underlying construct (ie mean differences), which would give rise to differences in response patterns across items, or if there was indeed item bias, I turned to the use of the MIMIC model in Galleo etal. 1994 and Chen & Anthony 03 papers. I estimated the MIMIC model with paths from group indicators to the latent variable to estimate mean differences in the construct (RMSEA = 0.08). Then I added direct paths from the group variables to each factor indicator, one at a time. Having done this for each group variable – factor indicator pair, I retained all the significant paths from group variable to the particular factor indicators in a final model. In this final model the mean differences between groups disappeared (RMSEA = 0.07). If this sequence of analyses is correct I am left with trying to interpret whether this means that there are mean differences between these groups in the latent construct, but the measure is essential invariant across these groups, or that the groups are the same on the latent construct but there is bias in certain indicators. Does this sequence analysis make sense? If so, is there a reasonable way to determine between these very different conclusions about the use of this measure in these groups, and indeed the nature of the construct for these groups? Thanks for any help you can provide. 


The models we recommend for testing measurment invariance of categorical outcomes for the default Delta parameterization are: 1. The Mplus default for multiple groups where thresholds and factor loadings are constrained to be equal across groups, factor means are zero in the first group and free in the other and scale factors are fixed to one in the first group and free in the others. 2. A model in which thresholds and factor loadings are free across groups, factor means are zero in all groups and scale factors are fixed to one in all groups. With categorical outcomes, thresholds and factor loadings need to be freed and constrained in tandem. 3. See Example 5.16 for an example of partial measurement invariance. 

EOJ posted on Thursday, December 15, 2005  11:23 am



Thanks for the response. I think that I did the analyses you suggested. This resulted in significant difference between groups (ie. the constraining loadings and thresholds to be equal across groups was sig. worse than freeing the loadings). However, when run in separate analyses the two groups provide very very similiar loadings. I looked at the thresholds for the two groups and saw that they were different on some items. When I freed the thresholds for the second group and fixed the scale factors to one for the second group and added an equality constraint for the means between groups the groups were no longer signficantly different. What I was trying to get at my moving to the MIMIC model was to be able to look at both mean differences in the latent variable by group and see if items on the scale or indicators were biased across groups. When I took the Gallo et al. 1994 approach the mean differences between groups disappeared when the sig. direct paths were estimated between the group variables and some of the indicators. This suggested to me there may be 'true' group differences in the underlying contruct that are manafest in the threshold differences represented by the signficant paths in the MIMIC model and the threshold differences in the standard measurement invariance approach you cited. This is a different interpretation than that the measure works differently in the two groups. Does this make sense or should I try a different tact in explaining what I'm doing? Thanks for your patience. 

bmuthen posted on Friday, December 16, 2005  8:11 am



Linda's step 1 works with thresholds and loadings in tandem, which is different than what you state in your second sentence above. More importantly, let me comment on what you say in relation to Gallo et al (1994) and group differences in construct means disappearing when direct effects are included. Perhaps I misunderstand you, but true group differences don't manifest themselves in threshold differences in a MIMIC model with significant direct effects. Also, a multiplegroup analysis that finds significant threshold (and loading) noninvariance does not point to true group differences in construct means either. The construct means can be different or not. Measurement invariance is a different issue than construct mean differences. Measurement invariance has to do with a conditional statement: given the same construct value, do different groups have different item probability. You may want to look at some of the writing I have done on the MIMIC topic as posted in the reference section of our web site. 

Antonio posted on Tuesday, December 27, 2005  2:21 am



I was always thinking of invariance in the context of Rasch measurement. The rasch ability scores and item difficulty scores in logits should remain equal (within the expected range of error)regardless of which items the examinees take and regardless of the sample which is used to estimate item difficulties, respectively. Can any one elaborate on this? Why is it important to demonstrate this? What does it prove? And introduce some literature on this. thanks 

bmuthen posted on Wednesday, December 28, 2005  5:03 pm



Yes, this is a central topic in the Rasch literature. I am not a Rasch expert myself, but any Raschrelated book should discuss this. For example, Erling B Andersen has a book which covers Rasch modeling (I don't remember the book title). Other Mplus Discussion readers might want to jump in here. 


Antonio raises a point that confuses many people. The theoretical "invariance" produced by the Rasch model is based on the fit of the model, and it presumes such fit across the groups being compared. But by most conceptions, violations of invariance are violations of the Rasch assumptions (e.g., unidimensionality), and so the supposed "invariance" given by the Rasch model evaporates. There was some literature on this about 30 years ago, with an exchange between Susan Whitely (now Embretson) and Ben Wright, but I can't recall the journal. Also, Erling Anderson's book was "Discrete statistical models with social science applications." Quite a nice book, but not easy. 

bmuthen posted on Monday, January 02, 2006  2:34 pm



Thanks, Roger. 

EOJ posted on Friday, January 20, 2006  9:01 am



Thanks for your responses on 12/15 and 12/16. I have looked at the example that you mention (5.16) and I see that you keep the loadings and thresholds both freed or both fixed across groups in parallel. I apologize if this is a foolish question, but I wonder why? (This is particulary puzzling since the treatment (i.e., fix/free) of intercepts and loadings can vary in MG models for continuous indicators). More generally, if groups differ on the degree of an underlying construct (means) those differences will be reflected in the thresholds for individual items (i.e. the proportion of the groups at different response levels of particular items – one group higher than the other) yet the interrelationship between items (expressed as loadings) could be the same. If you force the loadings and thresholds to be equal and unequal across groups in parallel you seem certain to get group differences comparing models even when it is just that one group’s response to items are lower than the other, even when the interrelation between items are the same across groups. Does this make sense or is there a key piece I am misunderstanding? Thanks for any help you can give. 


Continuous indicators have means and variances that are not dependent. The means and variances of categorical indicators are not independent. This is the basic reason that you cannot generalize analysis with continuous indicators to that of categorical indicators. One way to look at this is to consider the item curves for the categorical items, P(u=1factor). This curve is influenced by both the threshold and the loading and we are interested in testing whether the whold curve is different. Therefore, the thresholds and loadings should be considered together. 

EOJ posted on Friday, January 27, 2006  6:22 am



Thanks Linda, this makes it quite clear. 


I am trying to fit a simple model in mplus 4. It looks like this: MEAN by x1@1 x2@1; DIFF by x1@0.5 x2@0.5; Giving two latent variables, mean and diff, which are, respectively the mean and difference between the two measures (x1 and x2). WIth continuous measures, the variance and mean of x1 and x2 need to be constrained to zero. What I'd really like to do is do this with categorical (ordinal) measures. I have constrained the thresholds to be the same, and used one as an anchor to identify the latent means, but I am struggling to identify the variance of DIFF when I do. Am I missing something and being dim, or is there something more fundamental? (I'm trying this with simulated data, just to see if it can be done in principle.) Thanks, Jeremy 


A variance for a continuous latent variable underlying categorical observed variables can only be identified if you have multiple indicators of the latent variable; here you don't. In the probit framework, the x's in your model are treated at continuous latent response variables with fixed unit variances. So it seems that your latent variable DIFF needs to have its variance fixed at 1. 


I have a question about results of mimic modeling to examine item measurement bias. I have run two models: 1) a mimic model with dummy variable indicators for different groups in the sample with direct paths to the latent variable in order the estimate mean differences relative to the comparison group. 2) the same model as above with the addition of direct paths from the group indicators to the individual items that measure the latent variable. What I find is that these two models fit the data equally well. Under the second model the mean group differences observed in the first model are substantially reduced and nonsignificant, but there are some significant direct paths from group indicators to measurement items. Since the two models lead to different conclusions I wanted to be sure that with equal fits, the most parsimonious model (simple mean group difference) is the one to choose even when testing for measurement bias. Is this the case? The examples I've seen in the literature all seem to have both significant mean differences between groups and significant paths to one or more measurement items. Thanks for your help. 


I assume that you have only a few direct effects from the group dummies to the items. If you have a significant direct effect, then this is the model you should choose  the model without such a direct effect is misspecified so that the effect of the group dummies on the factors can not be trusted. 


Actually there are a fairly high proportion of direct effects from the group dummies to the factor indicators. This is an analysis of the FTND looking at potential differences among three groups relative to a fourth. So there are 6 indicators and 3 group dummy variables. Of the 15 potential direct paths (excluding direct paths to the item chosen to set the scale) 11 show signficant differences relative to the reference group. Does this influence the conclusion that the group mean differences model is misspecified? If there were true group difference in the mean level of a characteristic wouldn't one expect significant differences in the direct paths (that is those with lower mean ND should report lower cigs smoked, etc.)? Thanks again for you help. 


If you have 11 out of 15 possible direct effects significant, you have a problematic model. Remember that direct effects imply measurement non invariance (see e.g. my MIMIC writings), so if you try to interpret effects of group dummies on the factors in a model without these direct effects, you will have very distorted results. Re you last question, I assume that by "level of a characteristic" you imply the mean of an item. If so, the answer is no. If I understand you correctly, this question indicates an important misunderstanding, so you should read carefully the MIMIC writings on noninvariance. Mean differences in an item across groups is not the issue  the conditional mean difference given the factor is what direct effects concern. 


Well, I meant mean differences in the latent variable, that is nicotine dependence in this case. If a group has a mean level of ND that is lower than another group you would expect differences in their responses to items that measure dependence. Would those differences in response give rise to significant direct paths for the group dummies to the individual items? I will go back and read your papers, but I don't recall a situation in which the group difference on the latent trait was eliminated once direct path(s) to the items were included. Thanks. 


Group mean differences in the factors do not necessitate the existence of direct effects; these are two different things. On the other hand, including a direct effect can change a group difference in the factor that was seen when the direct effect was incorrectly not included. In our annual course, we have an example of the latter. 


Hi Bengt/Linda, I am working on a measurement invariance analysis using multigroup CFA with continuous outcomes (meanstructure & ml estimation). On page 345 of the User Guide you describe models for successive tests of invariance. In addition to the specifications in the guide I have also fixed one loading per latent to 1 for identification reasons in all steps. I was wondering if I should also fix the intercept of this indicator to zero in any of the tests(e.g. steps 2 or 3). Thanks for your time & for putting on some great workshops  cheers chris 

Boliang Guo posted on Friday, September 22, 2006  4:58 am



read Vandenberg 2000 paper published orgnizational research methods please 


Chris  no, the intercept should not be fixed. 


Thanks Bengt and Boliang, At the time of writing my question I only had access to a limited set of papers and they were rather vague about the constraints employed and a mild late night panic set in. Anyways, just to tidy up this discussion I read in Steenkamp & Baumgartner (1998) that two options for model identification are Option 1)in addition to setting the factor loading of one item to 1 in each factor also fix the intercept of this item to ZERO in each group  this equates the means of the latent variables to the means of the marker variables. Option 2) Fix the vector of latent means to zero in the reference group and constrain (atleast) one intercept per factor to be INVARIANT across groups  the item with the invariant intercept should also have an invariant factor loading. The latent means in other countries are then estimated relative to the latent means in the reference country. Thanks again for your time. Chris Steenkamp, J. E. M., & Baumgartner, H. (1998). Assessing measurement invariance in crossnational consumer research. Journal of Consumer Research, 25, 7890. 


Hi, I have a question about measurement invariance across groups using dichotomous items. What would be the best way to identify the item(s) that contribute to measurement invariance NOT being met? I was thinking about looking at MI; I am not sure whether there is another way. In addition, since equality constraints for loadings and thresholds have to be relaxed in tandem, is there any way to point to the specific source of differences between groups on specific items (i.e. is it the loading or the theshold?)? Thanks, Sven 


With multiplegroup analysis, MIs are useful. You can also search for measurement noninvariance via factor analysis with covariates, although that doesn't discover loading noninvariance  here you can regress each item on all covariates to find significant direct effects. Once the noninvariant item has been identified, I would use Mplus to plot the item against the factor for each of the 2 groups. It is the difference in these 2 item characteristic curves that describe the noninvariance. The difference due the intercept vs the slope may therefore not be relevant to disentangle. 


Hi Bengt, just a quick followup: I have been trying to identify items that are not invariant across groups based on the MI provided. None of the values stand out as being really high  all are below 10. My chisquare test of difference is still significant though. I guess that my sample size (around 18'000)may just make even small differences significant. Is this a case where you just argue that no there are no meaningfull differences in loadings/thresholds across groups despite a significant chisquare test of difference? Second, if I do relax a loading/threshold for a variable that loads on more than one factor, do I have to relax the equality constraint for all of the factors the item loads on? Thanks, Sven 


Hi Linda or Bengt, I am testing for measurement invariance across groups and relaxing some of the equality constraints do not lead to the changes in loadings/chisquare that are suggested by the modification indices. Do you have any idea why that could be? Is it because I am relaxing the thresholds and loadings in tandem? Or because of the specifics of the difftest? Thanks  Sven 


Note that the MIs are in the metric of chisquare so that the 5% critical value is 3.84 (because you have 1 df due to considering a single parameter). Note also the the chisquare difference testing has to be done using the DIFFTEST option. I can imagine that due to the large sample you can get significant differences that are not very large in terms of parameter values. So this is an empirical assessment of the degree of your overpowering. You do not have to relax invariance for loadings on other factors. If you get strange testing results even when taking the above into account, please send to support with the usual info. 


Thanks ... Sven 

Jeff Kennedy posted on Wednesday, September 19, 2007  7:43 pm



Hi, I have 4 samples (from 3 countries) for a 12item scale. All 12 items load on a single factor, with 6 items also loading on an uncorrelated method (negativelykeyed items) factor. This model fits well in each separate sample. Two samples are large (c. 1000), the other two are smaller (c. 200). My problem is that a 5point scale was used in 2 samples (a big and a small one) with a 7point scale used in the other 2 samples. Q.1: Is it meaningful to test ME for a pair of samples with the same scale, but very different sample sizes? Will the large sample dominate the indications of fit in the combined model? Q.2: Does the ordered categorical approach allow for testing equivalence where different numbers of scale points have been used? Millsap & YunTein (2004) comment on p. 481 that "a more general description would permit c [largest possible score] to vary across variables, but this extension introduces needless complications for the purpose at hand". This suggests it's possible  are there examples of such studies (or syntax) available? (Some old posts here note that the same indicator needs the same number of categories across groups). Millsap, R. E., & YunTein, J. (2004). Assessing factorial invariance in orderedcategorical measures. Multivariate Behavioral Research, 39(3), 479–515. Jeff Kennedy 


It is true that when one sample is very large and the other very small that the large sample can dominate. If the five category wordings are a subset of the seven category wordings, I think you can hold the thresholds of the common categories equal. However, if the item wordings are not the same, I don't think this can be done. 


Hi Linda or Bengt, I am trying to test for invariance of intercepts in a multigroup model. When I check each intercept step by step some models do not converge. I got a hint to subtract the item means from the raw scores/item values (i do not know the english expression for this procedure...). Now all models converge but I am not sure if this is an appropriate procedure since now the intercepts are equal to zero in all groups  thus, it is no wonder that all intercepts are invariant. Can you clarify if what I have done is adequate? Thank you, Markus 


When you free the intercepts, you need to fix the factor means to zero in all groups. See the discussion of testing for measurement invariance at the end of the multiple group discussion in Chapter 13 of the Mplus User's Guide. 


Hi Linda, I had had fixed the means to zero in all groups when the problem occured. I was able to circumvent it by doing the above mentioned ("center" the variable?). Now everything works fine  I just wanted to make sure that I have not done something inappropriate. (Perhaps I should mention that I am using MLM.) Thanks, Markus 


It is impossible to understand exactly what you are doing without seeing more information. Please send relevant input, data, output, and your license number to support@statmodel.com. 

Kathy posted on Monday, April 07, 2008  12:30 pm



Hi everyone. I am a new mplus user and I was wondering if anyone could expand on the multigroup factor analysis section in the user's guide (chapter 13) that states "For the Delta parameterization of weighted least squares estimation, scale factors can also be considered. For the Theta parameterization of weighted least squares estimation, residual variances can also be considered." Not sure how these parameters fit with respect to the two previous steps. Would this be step three in showing invariance? That is, would holding the scale factors and residual variance, for their respective parameterization, be the next nested models? In steps 1 and 2 they describe what needs to be freed and what is fixed, what needs to be freed and fixed if you are testing for invariance of scale factors or residual variance? Thanx. 


These parameters are usually not considered in measurement invariance of categorical outcomes just as residual variances are usually not considered for measurement invariance of continuous outcomes. I suggest reading about these parameters a couple of pages earlier. If you do not have a compelling reason to do so, I would use the models we suggest when testing for measurement invariance. 

Kathy posted on Monday, April 07, 2008  7:40 pm



Hey Linda. I have been using Millsap's and YunTein's 2004 article "Assessing Factorial Invariance in OrderedCategorical Measures" as my template for my analysis and they talk about equality of thresholds, factor loadings, and unique variances to show factorial invariance. I was trying to figure out how to test unique variance. 


To do that, use PARAMETERIZATION=THETA; in the ANALYSIS command and compare the model with all residual variances fixed to one in all groups to the model with residual variances fixed to one in one group and free in the other groups. 

Kathy posted on Tuesday, April 08, 2008  1:28 pm



Sorry Linda, not sure I understand. What happens to the thresholds and factor loadings? That is, the residual variances are already fixed at one in all groups and then fixed at one in one group and free in the others in order to test thresholds and factor loadings invariance. Do I have to run another nested model where the thresholds and factor loadings are fixed to equal across groups and the residual variance are fixed to one in across groups? 


The three models would be: 1. Thresholds and factor loadings free across groups; residual variances fixed at one in all groups; factor means fixed at zero in all groups 2. Thresholds and factor loadings constrained to be equal across groups; residual variances fixed at one in one group and free in the others; factor means fixed at zero in one group and free in the others (the Mplus default) 3. Thresholds and factor loadings constrained to be equal across groups; residual variances fixed at one in all groups, factor means fixed at zero in one group and free in the others You compare model 2 to model 3. 

Kathy posted on Monday, April 14, 2008  7:00 am



The analysis worked out great. I did find noninvariance across groups for the thresholds and factor loadings. I wanted to investigate the source of the noninvariance, in terms of which factor(s) are noninvariant. There are three latent variables and the data is categorical. Can you do this in Mplus? Not sure how to proceed, given the constraint requirements in step 2 above for thresholds, factor loading, residuals, and means. 


Request Modification indices and see which items have the largest ones. 

Kathy posted on Monday, April 14, 2008  9:44 am



I was thinking in terms of chisquare difference tests. That is, holding the factor loadings (and thresholds) for the first factor invariant and freeing the other two factors' loadings (and thresholds) across groups and comparing this model with the unrestricted model (in step one from Linda's response above). Next, then constraining the first and second factor loadings (and thresholds) invariant across groups and keeping the third factors' loadings (and thresholds) free to be estimated and comparing this model with the one factor constrained model. Can you use this approach in Mplus. If so, can you give an explanation of how to constrain and free the thresholds, factor loading, residuals, and means (see step 2 from above). 


You imply in your earlier message that you found noninvariance and want to investigate the source of the noninvariance. To me this would imply that you did difference testing of the models. I am not sure how you decided you found noninvariance otherwise. Yes, you can do difference testing in Mplus. You would need to use the DIFFTEST option to do this with WLSMV. See the user's guide. Example 5.17 shows how to free factor loadings and thresholds. Residual variances are fixed to one on one group and free in the others as the default. Example 5.17 shows how to fix a residual variance. 

Kathy posted on Monday, April 14, 2008  10:38 am



Hi Linda. Yes I used chisquare difference tests to determine nonvariance across groups for my three factor model. I am using WLSM estimator. To be more specific, based on your response of April 08, 2008 above would the following be correct to show which of the three factors in my model contributed to the noninvariance finding: 1. To test invariance of the first factor I would constrain the thresholds and factor loadings of the first factor and free thresholds and factor loadings of the other two factors across the groups; residual variances fixed at one in one group and free in the others for the first factor and residual variance restricted to one for the other two factors; factor means fixed at zero in one group and free in the others for the first factor and factor means fixed at zero in all groups for the other two factos. 2. Compare this model to the unrestricted model (step 1 in your response). 3. Repeat this process for each factor. 


Yes. 

Kathy posted on Thursday, April 17, 2008  9:30 am



Linda one more question. I found noninvariance for one of my three factors. I am interested in determining which item(s) in the nonvariant factor are invariant. I assuming that I follow the same procedure I used to test invariance of the factors (which is step one in my post above on April 14)  correct??? But I am not sure what happens to the factor means. Should they stay fixed to 0 when I am testing individual items or freed? For example to test for invariance of item one of the noninvariant factor would I constrain the thresholds and factor loadings of the first item and free thresholds and factor loadings of the other 10 items across the groups; residual variances fixed at one in one group and free in the others for the first item and residual variance restricted to one for the other 10 items; at this point I'm not sure what I would do with the factor means. Should it be constrain to 0 or freed for the noninvariant factor? 


Example 5.17 shows partial measurement invariance. 

Kathy posted on Thursday, April 17, 2008  1:46 pm



I don't think the example answers my question. To be more specific, I have a noninvariant factor and I want to test which item(s) in that factor are invariance across groups. For example, if I wanted to see if the first item of the factor is invariant I would have to constrain the first item fixed across groups and allow the other items to be free across groups. The problem is that when you free a factor across groups you have to set the factor mean to 0 for that factor. In my case I only have one factor but I am constraining the first item in that factor invariant and allowing the remaining items of that factor to be free. Given that the same factor has both constrained and freed items across groups, should the factor mean be set to 0 because some of the items are freed? 


The example shows how to specify partial measurement invariance. You can look at modification indices to see where measurement invariance might be. If you free something you should not, you will be warned. I suggest just starting and seeing what happens. If you have problems with specific set ups, you can send your input, data, output, and license number to support@statmodel.com. 

Kathy posted on Monday, April 21, 2008  9:18 am



Just wanted to pass along this example to help explain my question better. I think it will make it clearer. VARIABLE: NAMES ARE sde1sde3 im1im3 group; CATEGORICAL ARE all; GROUPING IS group (1=M 3=F); ANALYSIS: TYPE IS general; ESTIMATOR IS WLSM; PARAMETERIZATION=THETA; MODEL: SDE by sde1sde3; IM by im1im3; sde1sde3@1; im2im3@1 [SDE@0]; ! this is the term I am wondering about not all of the im items are free across groups, im1 is fixed. Is it still appropriate to have IM@0 [IM@0]; Model F: SDE by sde1sde3; IM by im2im3; ! [sde1$1]; [sde2$1]; [sde3$1]; ! [im2$1]; [im3$1]; 


If you free too many thresholds, you will receive a message about a problem with model identification. I'm not sure exactly how many you can free before you must fix the factor means to zero. 

Kathy posted on Monday, April 21, 2008  12:34 pm



In reality I have 20 im items, so I guess I could fix one item at a time to equal across groups and compare these 20 models separately to the unrestricted model in order to determine which of the items are invariant for the nonvariant IM factor. 


You can do this. 


Hi, I am attempting to test the measurement invariance of a secondorder factor model. The model has four firstorder factors. Five or more items load on each factor. I am using the WLSMV estimator. I am attempting to replicate the analysis presented by Chen, Sousa, & West (2005): 1. Configural invariance 2. Invariance of firstorder factor loadings 3. Invariance of secondorder factor loadings 4. Invariance of intercepts of measured variables 5. Invariance of intercepts of firstorder latent factors. I was able to complete step 1 and 2 above. When I ran step 3 the model ran with no errors however, the program seemed to ignore the command to hold the secondorder factor invariant. Firstorder factors were held invariant but the secondorder factor was not held invariant. The output was identical to step 2. In step 3, in group 1 I defined all the factors including the second order factor, fixed the first and second order factors means to zero, and fixed the scale factors to 1. In group 2, I fixed the first and second order factor loadings to be equal to the first group (by erasing the code), and allowed thresholds to vary between groups. (I allowed the correlations of residuals of two pairs of items to vary across the groups.) So, how do I need to change the code to hold the secondorder factor invariant? Yours truly, John Lawrence 


Please send your input, data, output, and license number to support@statmodel.com. I assume you have categorical factor indicators given that you are using WLSMV. Note that the models to compare for measurement invariance differ from those of the continuous case. See the end of the multiple group discussion in Chapter 13 of the Mplus User's Guide. 


Hello, I have a question in reference to Linda's 2nd post from 4/8/2008 in this thread. I am testing residual variances invariance, across gender, in a twofactor ordered categorical measure, and I am unable to generate the chisquare difference test from model 2 to model 3 when I fix the factor means to zero for one group and allow them to be estimated freely in the 2nd group. However, when I fix the factor means to zero in both groups and run models 2 and 3, the chisquare difference test runs successfully. How, if at all, does constraining the factor means (in both models 2 and 3) affect the interpretation of the residual variances invariance (via the chisquare difference test)? Thanks in advance. 


You would need to send the two outputs and your license number to support@statmodel.com for me to comment. 


My apologies, my previous post was confusing. Here is the needed clarification: I cannot generate the standard errors for the model parameters when I allow the factor means to be fixed to zero in group 1 and free in group 2. Thus, I cannot get the chisquare model test to generate for model 2 when the factor means are free in the second group. I believe the problem with identifying the model is due to high multicollinearity in one of the factors. Therefore, I am looking for a proper way to fix the problem. Since the chisquared difference test is estimated when I fix the factor means in both groups (for both models 2 and 3), I wanted to learn, from a conceptual standpoint, how might constraining the factor means (in both models 2 and 3) affect the residual variances invariance interpretation? Thanks. 


It is not possible to say without more information. You would need to send the two outputs and your license number to support@statmodel.com for me to comment. 


Hello, When doing scalar invariance testing across groups using WLSMV to test if it is apropriate to compare latent means, how should one specify the latent mean when thresholds are fixed (scalar invariance model). It seems that since in the metric incvariance model, one have to fix the latent mean to reach identification. When running the scalar invariance model, the model seems to be too strict. Since the chisquare difference test between WLSMV models state that a scalar invariant model with free means is not nested within the metric invariant model with fixed means. How then to compare the metric invariant and scalar invariant model? Furthermore. To reach identify the metric invariant model one have to fix the scalars. Should the scalars be fixed in both the metric and scalar invariant model? Best regards 


See page 399 of the user's guide where the models for testing measurement invariance for categorical outcomes are described. 


Dear Drs. Muthén, I have two questions concerning the two models on p. 399 (delta para.): 1) Wouldn't it be necessary to hold scale factors equal across groups (and fixing the means to zero) in order to achieve strong measurement invariance? Thus, does model 2 represent "residual variance invariance"? 2) I wonder if you would use the term "configural invariance" for model 1 taking into account that scale factors are functions of residual variances, loadings, and factor variances? Thanks in advance, kind regards, Ulrich 


1. Scale factors are not residual variances. If you want to test residual variances, use the Theta parameterization. 2. This sounds reasonable. 


Dear Dr. Muthén, Thanks for your quick response. A short followup question: Am I correct that Delta parameterization is for estimation purpose only? That is, the Theta parameterization is preferable because results are much more easy to interpret (i.e., no dependencies between scale factors and residual variances, loadings, and factor variances). Thanks in advance, kind regards, Ulrich 


Delta is the default. We recommend using it unless Theta must be used. Although Theta may be easier to interpret, it has problems in some cases. See Web Note 4 for more information. 


Dear dr. Muthén, I am trying to verify the measurement invariance of a scale prior to latent growth modelling. The scale is estimated with 4 likert items and we have 3 time points. Constraining the factor loadings works fine. However, when constraining the tresholds, the modification indices point towards a major problem with the factors (in stead of the tresholds). Though these values decrease when tresholds are deconstrained (estimated freely), they remain the largest modification indices. [ F1 ] 46.455 0.195 0.339 0.339 [ F2 ] 0.097 0.008 0.018 0.018 [ F3 ] 35.594 0.157 0.302 0.302 How should I interpret this finding? 


From the modification indices, it looks like you have factor means fixed to zero at all time points. This is too restrictive. The factor means need to be zero at one time point and free at the others. See the Topic 4 course handout under multiple indicator growth to see how to test for measurement invariance across time. 


I had a question regarding scalar invariance. I am interested in comparing 2 clinical groups, which we expect to have different levels of the latent means. My understanding is that in order to compare latent means, one has to establish configural, metric, and scalar invariance. We found configural and metric invariance, but not scalar invariance. My initial interpretation of this is that the items are biased, such that individuals with the same level of the latent variable manifest different observed scores, if they are in different groups. However, I just read the Vandenberg (2000) article, which states that if you expect groups to differ in terms of their latent score, it's not appropriate to test for scalar invariance "because differences in item location parameters would be fully expected," (p. 36) I'm confused about how to proceed  if you expect differences between groups, how can you demonstrate that those differences are actually real, if a test of scalar invariance is inappropriate? 


I don't have that article. As you present it the statement does not make sense because group differences in factor means does not imply differences in measurement intercepts (or threshold, difficulty). I don't know why the confusing terms metric and scalar (strong) invariance are used instead of the straightforward terms loading and intercept invariance, but translated it is clear that intercept and loading invariance is needed to study factor mean differences. Intercept invariance is often rejected, partly because of the stronger power to reject that than loading invariance. Note also that you don't need full intercept invariance here  there may be only a few items that show noninvariance and you can free those. You find them via Modindices. 


Thank you so much! In the Vandenberg et al. (2000, Organizational Research Methods) article, they state that if one has a "measure that is a valid operationalization of the construct and the hypothesis regarding group differences is true, then the items underlying that measure should also reflect group differences if mean difference tests were conducted on an itembyitem basis. Hence, a test for intercept invariance is not appropriate because differences in item location parameters would be fully expected. However, these differences are not biases in the sense of being undesirable as in rating source biases, but rather they reflect expected group differences." Am I misinterpreting what they're saying here, by thinking it means that one shouldn't test for intercept invariance if one expects group differences? Or is this perhaps an error in the Vandenberg article? I guess I was wondering if it would be possible to find invariant intercepts if one group has a consistently higher score on a questionnaire than another group. I did look for partial invariance, but even after freeing many intercepts, it still was not achieved. However, I do have full loading invariance. I really appreciate your advice on this! 


It is perfectly fine to find invariant intercepts if one group has consistently higher observed scores than another group. This is because the observed score mean m_g in group g, m_g = nu + lambda*alpha_g, where nu is the invariant intercept, lambda the invariant loading, and alpha_g the groupvarying factor mean. So, variations in alpha_g is what causes m_g to vary. I think it is hard to argue that you are measuring the same factor construct if the majority of the items don't have both intercept and loading invariance. 

Kathy posted on Saturday, October 15, 2011  2:02 pm



Received an interesting error/warning message. I ran a MGCFA (WLSMV) and found noninvariance for the loadings/thresholds i.e. significant DIFFTEST comparing the measurement invariance and noninvariance models. The MI indicated that the loading/threshold for one item was above the 3.84 level. Therefore, I added a groupspecific model command to the measurement invariance model which freed the one loading/threshold across groups (residual variance for this item was @1 for both groups). I ran a DIFFTEST comparing this model with the noninvariance model and I received this message:THE MODEL ESTIMATION TERMINATED NORMALLY THE CHISQUARE DIFFERENCE TEST COULD NOT BE COMPUTED BECAUSE THE H0 MODEL MAY NOT BE NESTED IN THE H1 MODEL. DECREASING THE CONVERGENCE OPTION MAY RESOLVE THIS PROBLEM. To me the models above, i.e. the partial invariance and noninvariance models, should be nested. How do you decrease the convergence option so that I can get a result for my DIFFTEST? 


Please send the two outputs and your license number to support@statmodel.com. 

Kathy posted on Friday, October 21, 2011  12:25 pm



Hopefully I can figure it out myself, but I don't see anywhere in the user's guide how to decrease the convergence. Is there an option I am not seeing? How would you decrease the convergence? 


See the CONVERGENCE option in the user's guide. 


Dear dr. Muthén, I am doing research about the growth curve of ethnocentrism. Before starting the analysis, I need to know if my factor of ethnocentrism (measured by six indicators at three points in time) measures the same at my three time points. I have no intention of going further into the analysis of measurement invariance. The only thing I need to know is if the factor loadings and intercepts of my indicators are the same across the three time periods. (Restrictions:  Factor loadings of corresponding items equal across time period  No covariance between residuals of each indicator with indicator that does not corresponds  Covariance between residuals of corresponding indicators  Variances of factors are equal across the three time periods) Which restrictions do I have to impose? How do I model this in Mplus? Thank you very much! 


Please see the Topic 4 course handout on the website under multiple indicator growth. The inputs for this are shown there. 


Thank you! Why is it not necessary to specify that the items of time 1 can correlate with the items of time 2 (and time 3)? Why do you not compare the factorial invariance model to the configural model? How is the last one specified in Mplus for multiple indicator growth curves? Are there any tricks to make factorial invariance model fit? Thank you! 


Q1. It can be important to explore the need for correlated items over time. Q2. I don't know that the configural model is helpful if your aim is to study change over time; you need metric invariance for that. Q3. A configural model can be specified in a longitudinal factor model setting by not holding factor loadings equal over time. You obviously need more than one factor for configural to be relevant. Q4. Good measurement and good pilot work on earlier samples using EFA. 


for conducting a multigroup measurement invariance analysis for categorical data I want to use from both groups the correlation matrix and intercepts as input data. How are the correlation matrixes and intercepts of both groups displayed properly in a datfile for mplus to read it as input data of two seperate groups? Many thanks! 


See pages 431432 of the user's guide. 

Sarah Ryan posted on Wednesday, February 15, 2012  1:00 pm



After establishing measurement invariance, I have been testing structural invariance, including invariance of latent means. Given that I have covariates in the model, I get estimated latent means in the output. The estimated latent means differ, as one might expect, depending on if the model is estimated separately for each group and if the structural invariance baseline model is estimated for both groups simultaneously. When I report the estimated latent means in my results, I'm thinking it would be more appropriate to report the means estimated in the structural invariance baseline model since the results from this model are more akin to comparing "apples and apples." Am I thinking about this correctly? 

Sam Smith posted on Thursday, February 16, 2012  5:48 am



I am testing for partial measurement invariance in a model with 1 factor with 4 dichotomous indicators (u1u4) in 2 groups (g1, g2). I am using WLSMV. I first estimated the measurement noninvariance model, and the full measurement invariance model. In the first partial measurement invariance model I freed the loading and thresholds for the u4 item. That worked out fine, but if I go on and try to free the loading and thresholds for an additional item u3, Mplus complains that this new model is not nested in the measurement noninvariance model (I am using DIFFTEST to compute the chisquare difference). The 2 models I'm comparing are: !Model 1 Model: f1 by u1u4; [f1@0]; {u1u4@1}; Model g2: f1 by u2u4; [u1$1u4$1]; !Model 4 Model: f1 by u1u4; Model g2: f1 by u3 u4; [u3$1 u4$1]; {u3@1 u4@1}; Model 4 gives me parameter estimates, but if I use DIFFTEST to compare the 2 models, I get the message: THE CHISQUARE DIFFERENCE TEST COULD NOT BE COMPUTED BECAUSE THE H0 MODEL IS NOT NESTED IN THE H1 MODEL. Is it not possible, when the indicators are dichotomous, to test a partial measurement invariance model in which only one item, in addition to the marker item is invariant? Thank you! 


Sarah: Please send the relevant outputs and your license number to support@statmodel.com. Please note that in a model with covariates, an intercept not a mean is estimated for the dependent variable. 


Sam: This should work. Please send the relevant outputs and your license number to support@statmodel.com. Include TECH1 and TECH5 in the OUTPUT command. 

Tyler Hunt posted on Sunday, March 04, 2012  9:11 am



I have a 10 item scale that I am trying to establish measurement invariance across male and female subjects. The ultimate goal is to compare latent means for differences. First, I looked at form, then factor loadings, then intercepts. Everything worked out and did not produce a significant chi square or change the other fit indices until I got to equating intercepts. It seems that the latent means are also equated by default. It seems strange to me to equate the very thing I am expecting to be different in order to meet the requirements to compare them. Once I freely estimate the latent means for the second group then the chi square is not significantly different from the equated factor model. Is this a problem because I fixed and freed parameters? I tried to freely estimate the latent means for both groups starting with equal forms but then the mean structure is under identified. Having an 8 degree of freedom change between equating loadings and equating intercepts is likely to raise eyebrows. 


Latent variable means are by default zero in one group and free in the others. The test of latent variable means is this model versus a model where they are zero in all groups. See the Topic 1 course handout under multiple group analysis. 

Tyler Hunt posted on Sunday, March 04, 2012  12:02 pm



I was taught that models are nested if you free or free parameters. Models are not nested if you do both. Is this just not applicable in intercept invariance? When freely estimating indicator intercepts I had to fix the latent means to zero to get it to run. 


When intercepts are free, factor means must be fixed at zero in all groups for model identification. See the Topic 1 course handout and video under multiple group analysis for a thorough discussion of measurement invariance and population heterogeneity. 

Sarah Ryan posted on Friday, March 09, 2012  3:26 pm



Referring to the comparison of probit coefficients across groups, Williams (2009) states: "So, in logit and probit models, coefficients are inherently standardized...the standardization is accomplished by scaling the variables and residuals so that the residual variances are either one (as in probit) or π2/3 (as in logit). If residual variances differ across groups, the standardization will also differ, making comparisons of coefficients across groups inappropriate." Does this statement apply in the context of SEM? I am using WLSMV (delta) and so my understanding is that it is not possible to constrain res. var. to equality. I would suspect they are not, although I have achieved all other criteria for structural invariance. Thus, I am not sure how to address the appropriateness of comparing parameters across groups in my work (but suspect I will get a reviewer comment on the issue). Any advice? 


With weighted least squares and the Delta parametrization, scale factors can be free in all groups except one where they are fixed at one. With weighted least squares and the Theta parametrization, residual variances can be free in all groups except one where they are fixed at one. This alleviates the problem mentioned above. See Web Note 4 on the website for a discussion of this topic. 


In a Measurement invariance Analysis I found a violation of metric invariance, a nonequivalence in factor loadings between two groups. now I want to know how strong the non equivalence in factor loadings is between the two groups, Is there a coefficient or measure for the strenght of the metric variance? many thanks, Marloes 


I think some look at the ad hoc coefficient of congruence. You can Google this. But really this is a matter of substantive interpretation much like practical significance. 

TingLan Ma posted on Monday, October 01, 2012  12:45 pm



Hi Dr. Muthen, I am trying to compare the factor means for four factors across two groups. I checked the measurement invariance and found partial measurement invariance across my two groups. Out of 27 items across the four factors, there are four items that do not have invariant factor loadings (this four items belong to 2 separate factors). And 11 items do not have invariant intercepts. My question is: what is the best way to compare the mean differences for a partial measurement invariance model? Originally I try using MIMIC model; however, under mimic model, it does not allow me to unconstrain factor loadings and intercepts of certain items. In this case, I may only use MIMIC model with items shown fully metric and scalar invariance (which leaves me only 15 items across 4 factors). However, I would really like to compare the factor means across two groups using the 27 items under the circumstance where some items' factor loading and some items' intercepts are set unconstrained. Is there a way in Mplus for me to approach this question? (I wasn't considering about mean structure analysis because I thought it also requires fully measurement invariance for all items) Thank you so much for your help! 


See the Topic 1 course handout and video on the website under Multiple Group Analysis. It shows a partial measurement invariance model and how to test for means across groups. 

TingLan Ma posted on Monday, October 01, 2012  6:25 pm



Hi Dr. Muthen, I ran analyses according to the Topic 1 course handout. However, when I proceed from "partial measurement invariance model with invariant factor variances and covariances (model A)" to "partial measurement invariance model with invariant factor variances, covariances, and means (model B)", the degree of freedom of model B stays the same as the model A, which shouldn't be the case. I have four factors, therefore it should yield df difference of 4 in this case (because I constrained the four factor mean to be equal in model B following the input on p223 in Topic 1 handout). Do you know where I should locate the problem? Thanks! 


Please send the two outputs and your license number to support@statmodel.com. 

TingLan Ma posted on Wednesday, October 03, 2012  1:42 pm



Hi Dr. Muthen, Thank you for identifying the error within my output. I have a further question. My results showed that constrained the factor mean to be equal result in a worsen model fit. Then how do I identify which factor mean is significant different in one group versus the other? In the topic 1 handout you stop the slides when you conclude that constrained factor mean to be equal result in a worsen fit, but did not go on to test where different factor means lie. How should I test the significance in the difference of factor means in this case? Thank you very much! 


When you run the model with factor means at zero in all groups, ask for modification indices and that will help you identify mean differences across groups. 

steve posted on Monday, February 11, 2013  2:09 am



Dear Drs. I have the same problem as reported by John Lawrence 06/2008. I am testing measurement invariance of a secondorder model following the approach provided by Chen and colleagues (2005). Unfortunately, I can't hold the secondorder factor invariant. The program seems to ignore the command and provide the same results when testing invariance of firstorder loadings. I appreciate any help. Below you will find the syntax as follows: GROUPING IS VERSION (0 = ONL 1 = PP); ANALYSIS: ESTIMATOR = MLR; PROCESSORS = 2; MODEL: F1 BY x1 x2 x3 x4; F2 BY x5 x6 x7 x8; F3 BY x9 x10 x11 x12; F4 by F1 F2 F3; [F1@0 F2@0 F3@0 F4@0]; MODEL PP: [x1x12]; With kind regards Steve 

steve posted on Monday, February 11, 2013  4:09 am



Oh.. I apologize for posting in the wrong forum. My questions concerns continuous factor indicators. 


A secondorder factor with three indicators is just identified. Therefore, model fit is the same for both models. Model fit cannot be assessed for a justidentified model. 


Hello, I have a question about MPLUS recommendations for invariance testing using theta parameterization and ordinal categorical variables. It is recommended that item uniquenesses be constrained to unity in both groups to identify loadings and thresholds. It is then suggested that uniquenesses be freed when constraining loadings and thresholds. Why wouldn't we keep the item residuals constrained in the test of loadings/thresholds so that we can be sure that the change in the chisquare relates directly to loading/threshold invariance? Is there any problem with keeping the item residuals constrained in a second model and then proceeding with a check on strict invariance by releasing the item residuals? Thanks in advance for your help. 

steve posted on Tuesday, February 12, 2013  6:38 am



Thank you. I placed an equality constraint on the variance of f1 and f2 in order to address this issue (f1 f2 (1);) which, however, did not work. Is there any meaningful way testing for invariance of first and secondorder loadings with only three firstorder indicators? 


Steve: I don't know of any meaningful way to test a justidentified model unless some other constrains make sense. 


Dear Mplus Team, (1) This question is about partial measurement invariance in models with categorical outcomes and with more than 2 groups (e.g. 3 groups: A, B, C and assuming a full invariance baseline model with reference group A). If MOD.indices say that in group C the loading and/or the thresholds of item i seem to be noninvariant and assuming that’s true: Would it be correct to free the thresholds and loading for item i only in group C [WLSMV and Delta/Theta: additionally fix the scale factor / the residual variance at one in group C]? Or is the correct way to do this not only in group C but also in group B and reference group A (here the scale factors/resid. var. are already set to one) even if the loadings/thresholds for item i are equal in group A and B? The finally goal is to compare the factor means between the groups. (2) It’s not a problem in single factor models if categorical outcomes x (dichotomous) and y (polytomous) have different numbers of categories as long as for each variable the number of categories is the same across groups? (3) I know, there are options in ML/MLR [e.g. (*) ] if there are empty (unused) categories in at least one of the groups. But in testing partial measurement invariance models I can’t set free a threshold for a group if there at least one related category is empty, that's right? Thank you very much. 


1. Yes, only free in group C. 2. Yes. 3. You will receive a message if this is the case. 


Hi, I am testing for longitudinal invariance with categorical data, using DIFFTEST between unconstrained and constrained (metric and scalar) models. Firstly, for the free model: Analysis: processors= 4 ; parameterization = theta ; estimator = wlsmv ; MODEL: f1 by t1_1c* t1_2c t1_3c t1_4c t1_8c ; f1@1; f3 by t3_1c* ...etc for t3 items; f3@1; f2 by t1_2c* t1_5c t1_6c t1_7c ; f2@1; f4 by t3_2c* ...etc; f4@1; savedata: DIFFTEST IS FREE_MOD.dat ; And then to test for metric invariance: Analysis: processors= 4; parameterization = theta ; estimator = wlsmv ; DIFFTEST is FREE_MOD.dat ; MODEL: f1 by t1_1c* (1) t1_2c (2) t1_3c (3) t1_4c (4) t1_8c (5) ; f1@1; f3 by t3_1c* (1) ...etc ; f3@1; [f3@0] ; [f1@0] ; t1_1c pwith t3_1c ; ...etc; f2 by t1_2c* (16) t1_5c (17) t1_6c (18) t1_7c (19) ; f2@1; f4 by t3_2c* (16) ...etc; f4@1; [f2@0] ; [f4@0] ; t1_2c pwith t3_2c ; ...etc; But: THE CHISQUARE DIFFERENCE TEST COULD NOT BE COMPUTED BECAUSE THE H0 MODEL MAY NOT BE NESTED IN THE H1 MODEL. DECREASING THE CONVERGENCE OPTION MAY RESOLVE THIS PROBLEM. Can you suggest any reason for this, please? 


Please send the two outputs and your license number to support@statmodel.com. 


I am trying to run a longitudinal invariance model with four waves of data and 6 indicators per scale. I am now estimating the strong invariance test but am receiving an error. I have looked at the output but cannot figure it out. Can you please help. This is the error I'm receiving. THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING PARAMETER 81. THE CONDITION NUMBER IS 0.261D16. 


Tech1 shows you which parameter number 81 is. You can then check with the UG pages 681682, describing the parameterization for growth modeling with measurement invariance. 


I have been running multigroup measurement invariance analyses (using ordinal variables with three thresholds, Weighted Least Squares, and Delta Parameterization) via the convenience commands from version 7.11. My understanding, based on the Mplus User’s Guide and Millsap (2011), is that A) for Metric, the first threshold of each item is held equal across groups, and the second threshold of the item that is used to set the metric of the factor is held equal across groups; and for B) Scalar all thresholds are held equal across groups. In my model, the chisquares and degrees of freedom do not change from Metric to Scalar. When I review the Tech1 output, the Tau matrix is the same for both Metric and Scalar Models. Should all the thresholds be held equal across groups with the Scalar convenience command, or am I misinterpreting which thresholds in particular need to be constrained? One additional point is that I have provided start values for my model, but it appears that the constraints imposed in the configural and metric models are correct. 


Please send the two relevant outputs to Support. 

dvl posted on Friday, January 31, 2014  3:17 am



Dear Professor, If my model is tested for metric and scalar invariance and both assumptions are supported, I work further with the default options in MPLUS assuming that factor loadings and intercepts are equal across groups? Or should I use the specific factor loadings and intercepts for each specific group in the subsequent steps? 


If you have established invariance of intercepts and factor loadings, you should hold them equal in further analyses. These equalities represent measurement invariance. 

Kathleen posted on Wednesday, February 12, 2014  8:55 pm



Is there a best way to examine partial measurement invariance –specifically DIFwith respect to time? I compared a LTA with full invariance to one with full noninvariance, and using the loglikelihood ratio test with scaling correction factors, found the variant item thresholds fit the data better. How can I examine which item function differently over the 2 time points? Modindices can't be used and I have not found much on this in the webnotes, the UG or on this discussion boards. Despite the better fit of the noninvariant model, I may impose invariance for substantive reasons, and proceed with a moverstayer model, as my observed response patterns fit that model well, but I would like to understand the cause of the noninvariance. Thank you. 


Why can't you get modification indices? 

Kathleen posted on Thursday, February 13, 2014  9:27 pm



Hi and thank you for responding to my post, Dr. Muthen. I thought since I wanted to compare the threshold invariance over time, I asserted MODINDICES in the LTA model, but the error message this is not an option for mixture models with more than one categorical variable. I'm using type= mixture complex, if that matters. How can I compare thresholds over time if not in the LTA? Thank you again. 


You may have to constraint one item at a time to have equal measurement parameters across time and do a series of LR  chisquare tests. 


Dear Mplus team, I have a little problem while trying to check longitudinal invariance between two times (20 items, 3point ordered categorical observed variables, WLSMV estimator with theta parameterization). In metric invariance all factor loadings & first threshold of each item were constrained equal between times  except the reference item which had both thresholds constrained. Question: If in metric invariance model one of the factor loadings is not invariant, then in scalar model SECOND thresholds of this item should be NOT constrained equal across time? 


See the Version 7.1 Mplus Language Addendum on the website. Starting at page 8 we describe testing of measurement invariance for ordinal outcomes across groups. The same principles apply across time. 

dvl posted on Wednesday, March 12, 2014  8:53 am



Dear Professor, In mplus the default setting when performing a multiple group confirmatory factor analysis is that factor loadings and intercepts are equal across groups. However, I only found metric and no scalar invariance in my twofactor model. Now the question is the next one: Should I free the equality constraints (defaults) on the intercepts when moving from the measurement to the structural part of my model in this specific case? I think I should, but I would rather be a 100% sure. I'm not completely sure whether the structural part of my model requires metric and scaler invariance... Thanks a lot for your help! 


You can free the intercepts as long as you are not comparing factor means. 

Lydia Brown posted on Sunday, March 23, 2014  10:31 pm



I am doing an ESEM measurement invariance analysis across two groups using WLSMV. I originally followed your instructions based on Chapter 14 of the users guide to test configural then scalar invariance. However, in your Mplus Version 7.1 addendum, you offer the option of testing metric invariance by fixing some of the thresholds for identification purposes("The first threshold of each item is held equal across groups. The second threshold of the item that is used to set the metric of the factor is held equal across groups." I am wondering if this new metric option is appropriate for my model? 


No, it is not. The METRIC setting is not allowed for ordered categorical (ordinal) variables when a factor indicator loads on more than one factor, when the metric of the factors is set by fixing the factor variance to one, and when Exploratory Structural Equation Modeling (ESEM) is used. 

RuoShui posted on Friday, April 04, 2014  11:31 am



Dear Dr. Muthen, I have a question of measurement invariance with ordinal indicators. When testing measurement invariance with multiple ordinal indicators, metric invariance involves constraining some thresholds in addition to factor loadings. Is it correct that as a result, it is not as easy to achieve metric invariance as in the case with continuous indicators. If partial metric invariance can be achieved through relaxing a couple of threshold, is it justified to proceed to testing scalar invariance? Thank you very much! 


The threshold constraints used with metric invariance for ordinal variables make the model identified, but does not imposed extra restrictions. So I don't think you can relax "a couple of thresholds". Fur further details, see the Millsap book. 

Back to top 