Message/Author 

Anonymous posted on Thursday, March 07, 2013  7:47 am



I've been reading your online notes on multiple group analysis with categorical outcomes. After the configural invariance step you suggest going straight to a model where intercepts and slopes etc are constrained across groups and the factor means are freed. This is different from the continuous case where one might examine the factor loadings first and then add in the intercepts in two separate steps, and I was wondering why this would be. 


This difference between the continuous item case and the categorical items case is due to having less information with categorical items. With binary items there is an identification issue that prevents testing of loading invariance only (metric invariance), at least when allowing groupvarying residual variances. So therefore we recommend going straight from configural to scalar invariance. With polytomous items it is possible to identify and analyze the metric model. But even so, it is not as straightforward as with continuous items. Roger Millsap has written on the polytomous case; see for instance his book Statistical Approaches to Measurement Invariance. 

Anonymous posted on Friday, March 08, 2013  12:33 am



Thank you, that is extremely helpful. There's a variety of estimators that I can use in Mplus when fitting these modelsmy outcomes are binary. I recall being told once that ML is associated with the Differential Item Functioning/Item Response Theory approach, and WLSMV is associated with the CFA approach. Is that correct? 


No, that is not correct. IRT and CFA with categorical indicators are the same model. You can use either ML or WLSMV as estimators when you have categorical variables and factors. With ML, each factor requires one dimension of integration and each residual correlation also requires one dimension of integration so ML can become computationally heavy with several factors. In this case, WLSMV is preferred. Both ML and WLSMV are good for IRT as is Bayes. 

Anonymous posted on Monday, March 11, 2013  9:01 am



I'm trying to use the steps you outline for testing measurement invariance across multiple groups with categorical outcomes, within a MACS framework. Having having obtained my baseline models for each group, I have fit the configural model for each group as eg. f1 by a* b c d e; [f1@0]; f1@1; [a$1 b$1 c$1 d$1 e$1]; {a@1 b@1 c@1 d@1 e@1}; According to your outline, would the next equivalent step be a model in which the loadings and intercepts are constrained the same across groups, with the means still fixed at zero in each group, factor variances at 0 in each group, and scale factors fixed at 1 in the first group and freely estimated in the other groups? I can estimate this model (it isn't a great fit), but end up having to examine MI for factor means and slopes/thresholds simultaneously, something I would prefer not to do. I also wondered about using difftest to compare models in the invariance sequence, as when the scale factors are freely estimated I end up with more parameters in a simpler (ie loadings constrained) model than the more complex model (loadings unconstrained but scale factors fixed)? Any pointers you can offer will be much appreciated as always! 


Please see page 485 of the user's guide and the Topic 2 course handout on the website where the inputs are given under multiple group analysis. Factor variance should not be fixed to zero. If you continue to have problems, send the output and your license number to support@statmodelc.om. 


Hi, I was wondering whether it makes sense to test for residual variance invariance (as can be done in CFAs with continuous observed variables), once scalar invariance has been established in a multigroup CFA with categorical data using the Theta parameterization. Thus, after scalar invariance has been found (following the procedure as depicted on page 486 of the MPlus manual), would it make sense to estimate a third model in which the residual variances are again fixed to be one in all groups, and comparing this model to model #2 described on the top of page 486. Looking forward to your answer. 


Yes. 

Alvin posted on Thursday, May 08, 2014  1:09 am



Hi Dr Muthen, can I just clarify is the configural model one where factor loadings and intercepts are free to vary and no equality constraints are imposed across groups? I realize mplus by default holds factor loadings and intercepts equal across groups to test measurement invariance. Does this mean that in testing invariance of factor loadings alone, one has to then override the default equality constrain of intercept as the first step? 


Yes, on configural. Note that the current Mplus allows the Analysis options model = configural metric scalar; where your Model statememt simply says f by y1y10; and the rest is done automatically. 

Alvin posted on Thursday, May 08, 2014  11:30 pm



Thanks very much Dr Muthen  that's fantastic! I notice you don't get std estimates in the output using this option  is there a way around this? Also, in testing latent mean difference across groups, I constrained the factor mean (to 0) and variance of the reference group to 1 for comparison, while letting the factor mean and variance of the other group be free. This was done with equal intercepts across groups (scalar model)  and the model showed a good fit  does mean that the latent mean structure differs across groups? 


As the output says, you don't get standardized when you ask for several of configural, metric, scalar, but you get it if you do one at a time. As for your last question, perhaps you are asking if the factor means are different across groups  if so the z value for the factor mean in the second group will tell you. You should study up on our Topic 1 discussion of invariance issue; video and handout is on our website. 

Alvin posted on Thursday, May 15, 2014  11:49 pm



Thanks very much Dr Muthen I've read your notes on multigroup CFA. I've tested configural, metric, and scalar invariance, and further, invariance of factor variance and residual variance, on each of the five subscales from a measure I developed. I also looked at latent mean differences across groups. As predicted, two of the subscales tested did not pass the scalar test, that is the intercepts varied across groups. What do you do in this case? 


This means you have partial measurement invariance. Please listen to the Topic 1 course handout and video where this is discussed under multiple group analysis. 

Alvin posted on Tuesday, August 19, 2014  11:31 pm



Hi Dr Muthen I was wondering how do you test partial invariance (by freeing parameters) using the latest feature of MI testing in Mplus? Say if you were to release constrains for some of the items  do you use classspecific syntax? Thanks 

Alvin posted on Tuesday, August 19, 2014  11:59 pm



A followup question is that in my output, while the chisquare test is significant for each model (1 configural, 2 metric, 3 scalar), LRT tests for nested model comparisons (between 13, 23, 12) are not significant. Does this mean invariance across all levels? 


Yes, you can use classspecific syntax. You should get a wellfitting configural model as a first step. Once you do that you should test it against the other models. 

Xu, Man posted on Friday, December 25, 2015  11:53 am



Merry Christmas! I would like to check something on number of parameters regarding residual variances under THETA and WLSMV setting (in Mplus), as I am a bit confused about the degree of freedom. Take a twogroup example of six binary items with one factor. The number of model parameters would be as follows: Conf inv (24 para): 10 loading, 12 threshold, 0 residual var, 2 factor variance, 0 factor means. weak inv (19 para): 5 loading, 12 threshold, 0 residual var, 2 factor variance, 0 factor means. strong inv (26 para): 5 loading, 6 threshold, 12 residual var, 2 factor variance, 1 factor means. It's been suggested strong invariance model compared against config model directly so it shall be a test with 3 degree of freedoms (26 v.s. 21). Then I am a bit confused because there are actually 10 loadings and 12 threshold involved here but the difference in degree of freedom is only 5. Is this correct or I have missed something? 

Xu, Man posted on Friday, December 25, 2015  11:59 am



Sorry, I miss calculated, there should be only 6 residual variances in the strong inv model, but it still leaves 5+6+6+2+1=20 parameters. That is only 4 less than the confi inv model. 


Please send the output and your license number to support@statmodel.com. 

Xu, Man posted on Saturday, December 26, 2015  4:49 pm



Thank you. I just got hold of the Millsap & Tein 2004 paper and they seem to say something about binary outcomes being a special case  I will have a look at this first. 


4 parameters difference sounds right. The binary case is different in that the metric (weak) invariance model is not identified with 6 residual variances. 

Xu, Man posted on Monday, February 29, 2016  8:59 am



Dear Dr. Muthen, I am carrying on analysis with repeated measures of 6 binary items (two waves). I also wanted to check the mean change of the latent factors. Holding loadings and thresholds equal over time, I noticed that the difference in latent means was dependent on the constraint of the residual variances of the binary indicators. If I specify residual variances equal, then there is a latent mean difference, but this different disappears if I specify the residual variances to be different in the two waves. I am a bit puzzled mostly because I thought latent means are not supposed to correspond to residual variances  could it that things are different in the case of binary variables? Kate 


Please send the two outputs and your license number to support@statmodel.com. 

Dennis Li posted on Saturday, July 30, 2016  4:32 am



I am trying to run invariance testing on a 6class LPA with a known class, but I am having trouble with the syntax. I get the error message "Measurement invariance testing...is only available for TYPE=MIXTURE with one categorical latent variable and the KNOWNCLASS option." Is it possible to test my LPA against a known class using this method? My (abridged) syntax is: VARIABLE: Classes = c(6) g(2); ! Here's the source of the error Knownclass = g (g=1 g=2); ANALYSIS: Type = mixture; Model= configural metric scalar; 


I don't think so. You would have to set up the invariance testing restrictions yourself. 


Hi, we have run a twocorrelated factor model for two groups. For scalar invariance, when factor means are fixed at 0 in group 1 and free in group 2, the values are .12 and .14 for group 2, both with pvalues < .01. This implies that both groups differ in their means  right? Fit of the model with fixed loadings and intercepts/thresholds is good though. What puzzles us is that when comparing factor scores  which we had estimated in a previous step (for both groups combined) , the mean factor scores for group 1 and group 2 are more or less the same. Have we misspecified the multigroup model in any way? Could it be due to the fact that our categorical indicators have varying numbers of categories and differing values? Thanks for your help! 


Q1. Right. Means and variances of estimated factor scores don't behave like those of true factors. Q2. No necessarily. Q3. I don't think so. Although if the number of categories vary across the groups you should use the * approach discussed in the UG. 


Dear Bengt, thank you very much for your reply. I have checked the output file again and have come across the following output: Under "UNIVARIATE SAMPLE STATISTICS", we receive "DESCRIPTIVE STATISTICS" for both factors for both groups: Mean (F1, Group 1) : 7.989 Mean (F1, Group 2) : 8.041 Mean (F2, Group 1) : 8.388 Mean (F2, Group 2) : 8.399 How does Mplus arrive at these values? The mean scores for both factors F1 and F2 vary between .052 and .005 in both groups. Could this explain the difference? We have the same number of categories in both groups, the number of categories just differs between items. 


We need to see your full output to say. Please send to Support along with your license number. 


Dear Bengt, thanks for your reply, we have solved the above question. We have an additional question: When we ask Mplus to save factorscores in the model in which we test scalar invariance (fixing loadings, thresholds, intercepts), the factor scores we receive for both groups G1 and G2 differ (this would be in line with the means of both factors F1 and F2 being different between both groups). When simply specifying a common model (no specification for groups) and do not fix any parameters (apart from factor variance @1 in order to identify the model) and request factorscores, the two groups don't differ. Is the difference due to fixing the parameters in the scalar model? What does that imply? 


The combined group model is essentially wrong for each of the groups, especially the factor covariance matrix. You should ignore that analysis. See also Section 3 of the paper on our website: http://www.statmodel.com/download/Article_0241.pdf 

M.F. posted on Thursday, August 04, 2016  2:32 am



Dear Mr. Muthen, thanks for your reply. So you would say that we have to use the factor scores saved in the model for scalar invariance for further analysis? It is interesting that the latent means in the one model differ between the two groups and in the other not although we actually have scalar measurement invariance for these two groups. 


Q1. Yes. 

Kathy Xiao posted on Monday, January 23, 2017  5:37 pm



Dear Dr. Muthen, I am doing a measurement invariance testing with 3 racial groups (black, latino, white) on a 13item construct. MODEL: F by F1F13 After I do the baseline model for each racial group separately (USEOBSERVATION to select group), I got the model fit as: RMSEA (CI) blacK: 065 (.690, .782) latino: .083 (.710, .778) white: .071 (.647, .689) and the CFI/TLI are all above 0.95. Given the relatively bad fit of RMSEA, does that mean I cannot continue the invariance testing? Or is there modification of the model I can do? Many thanks! 


Try using Modindices. 

Kathy Xiao posted on Tuesday, January 24, 2017  3:17 pm



I used MODINDICES(3.84) but it did not show any error correlation. Or shall I use the MODINDICES(all) instead? 


Yes, use All. 

Kathy Xiao posted on Tuesday, January 24, 2017  5:14 pm



I used MODINDICES(all) and found error correlations between S1 and S2 for Latino and White groups, then I added them in the MODEL Latino and MODEL White,the RMSEA are now below .06. Does that mean I need to keep the "S1 WITH S2" in the later measurement invariance through configural model to strict model? 


That's reasonable. But you may want to discuss these general analysis strategies on SEMNET. 

Kathy Xiao posted on Tuesday, January 24, 2017  6:20 pm



Thanks for your reply. I will move the discussion to SEMNET later. One question on MI of multiple group: I further tested the measurement invariance and found noninvariance. So I proceeded to test the invariance of loadings by comparing the metric model to a model with one loading freed at a time. However, I found that all of the loadings freed lead to significant worse of the model (p<0.05), and this is the case for all the 22 group comparisons. Would you think this suggest noninvariance of this construct across racial groups? Is there anything else that need to be considered in the analysis? 

Kathy Xiao posted on Tuesday, January 31, 2017  11:26 pm



Dear Dr. Muthen, I am doing measurement invariance with 3 groups, the outcome is a 12item categorical variable with 4 options in responses. I found metric invariance but noninvariance for scalar model. I want to proceed to free the thresholds to find the source of noninvariance. But there are 3*12 thresholds, is there any systematic and recommended strategy on which threshold I shall start with? Shall I free them one by one? Or shall I free them two by two? Or else? Thanks! 


There is no agreed on strategy I think. I would go variable by variable (so all thresholds for a variable). 

Kathy Xiao posted on Wednesday, February 01, 2017  6:13 pm



Why it is all thresholds for a variable? Can I do one threshold at one time? In this case I know exactly how many thresholds are allowed to be free? 


You can do one threshold at a time but that would be very cumbersome. I would think that it is the variable itself that causes noninvariance, not necessarily specific thresholds. 

Kathy Xiao posted on Sunday, February 05, 2017  6:43 am



Thanks for your reply! Follow up the previous questions, I tested one threshold at a time, and I found 6 out of 33 thresholds led to significant worse fit. I thought freeing these 6 thresholds would make the comparison nonsignificant. However, when I freed them together, the Chisquare comparison was significant. I also tried to free a combination of thresholds that cause the biggest Chisquare change, the result was also significant. Do you think is there is anything wrong with this strategy? 


You may want to ask this question on a general discussion forum like SEMNET. As stated earlier, there is no agreed on strategy. 

Kathy Xiao posted on Sunday, February 05, 2017  6:56 am



Thank you! 

Back to top 