Group differences in LCA PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 Jennie Jester posted on Monday, February 10, 2003 - 11:56 am
I want to compare latent class analysis for symptoms in girls and boys. Can I do a 2-group analysis in LCA (where I could then force the groups to be equivalent or not and check whether this causes worse fit), or do I do the two groups separately?

Another question: I have been looking at Tech 8 to check on smooth convergence. Occasionally, a "QN" or "FS" pops up in the algorithm column and this is often accompanied by a big jump in the change in log likelihood. So the change in log likelihood is not smoothly converging to zero (which is what I thought I was hoping to see in the Tech 8 output). Is this a problem? Can you explain to me a little more what I should be checking for in the Tech 8 output.

Thanks for all the help,

 Linda K. Muthen posted on Monday, February 10, 2003 - 1:37 pm
It is not necessary to use multiple group analysis to compare the symptom items across gender. You can just regress the symptom items on gender and accomplish your goal. The symptom items have no variances or covariances, so you don't need multiple group analysis.

The QN and FS in the algorithm column indicate that the estimation algorithm has changed. QN stands for quasi-newton and FS stands for Fisher Scoring. This is not something to worry about. Following are the things you should be looking at in the TECH8 output:

1. loglikelihood should increase smoothly and reach a stable maximum -- with a change in algorithm, there may be more of a change

2. absolute and relative changes should go to zero

3. class counts should remain stable
 Christian Geiser posted on Friday, February 25, 2005 - 2:09 am
Dear Linda,

I want to check whether a latent class model with 12 binary LC indicators and 4 classes is the same for males and females. Therefore, I used the KNOWNCLASS option to do a multigroup analysis. I arrived at constraining the response probabilities to be equal across gender but now I also want to test if the class sizes are equal for both males and females. I tried to specify this with the following model statement:

[csex#1.c#1*0.882] (49);
[csex#1.c#2*0.216] (50);
[csex#1.c#3*0.716] (51);

[csex#2.c#1*0.882] (49);
[csex#2.c#2*0.216] (50);
[csex#2.c#3*0.716] (51);

However, it didn't work. Could you give me a hint about what I must change? Thanks a lot!
 bmuthen posted on Sunday, February 27, 2005 - 11:30 am
I think perhaps the easiest way to get the gender invariance of class probabilities that you want is to instead let c and cg be uncorrelated as they are by default. Note that in ex 8.8 you have

c#1 on cg#1;

which means that the c class probabilities vary as a function of the cg (Knownclass) classes. Leaving out that line makes the c probabilities the same for the classes of cg, which is what you want. Try that.
 Christian Geiser posted on Monday, February 28, 2005 - 7:09 am
Thank you Bengt. This was actually what I did (I left out that line) but it appears from my output that the class sizes are only approximately (but not perfectly) identical:


1 1 247.64496 0.14610
1 2 116.88705 0.06896
1 3 148.71330 0.08774
1 4 221.79412 0.13085
1 5 115.96057 0.06841
2 1 245.60793 0.14490
2 2 115.92559 0.06839
2 3 147.49004 0.08701
2 4 219.96972 0.12978
2 5 115.00672 0.06785

When I do the (restricted) analysis in PANMARK, the class counts in both groups match perfectly (though the parameter estimates appear to be identical to those of Mplus). Do you have an explanation?
 Linda K. Muthen posted on Monday, February 28, 2005 - 8:58 am
Please send your input, output, and data to so we can answer your question.
 bmuthen posted on Thursday, March 03, 2005 - 1:47 pm
Christian - it looks like the difference in estimated class probabilities is merely due to the slightly different group sizes. You have sample size 851 in the first group and 844 in the second (this gives a ratio of 1.008). Since the estimated proportions that you give above are from the joint distribution of the 2 x 5 table (the 10 proportions at to 1), you would only see the same 5 class proportions in the two groups if you account for the sample size difference. For instance, 0.14620/1.0083 is approximately 0.1449.
 Christian Geiser posted on Thursday, March 24, 2005 - 1:34 am
Dear Bengt, thank you very much. Later when I was looking at the transition probabilities I realized that the class probabilities were actually identical in both groups. I have another question. I did my multigroup LCA both in Mplus and PANMARK. Now, I found that the Loglikelihood, AIC and BIC values as well as the parameter estimates are identical in both programs. However, df, Pearson X^2 and the LR test statistic were different for multigroup analysis (not for single group LCA!). Do you have an explanation? Thanks again, Christian
 bmuthen posted on Thursday, March 24, 2005 - 7:52 am
There was a glitch in the computations of those statistics when Knownclass was used - this has been fixed in version 3.12. Let us know if you still have discrepancies after trying that.
 Anonymous posted on Thursday, September 01, 2005 - 9:46 am
I have cross-sectional data from a psychological inventory and I want to test the hypotheses that there are different classes (or profiles) for different age groups (2 age groups). What is the best strategy to test this hypothesis, using age as a predictor or using Knownclass, e.g., ex 8.8?
In addition, what paper do you recommend that could help me interpret the data for group differences?
I need to be as descriptive as possible because my audience will not be very sophisticated in terms of stats knowledge.
 bmuthen posted on Thursday, September 01, 2005 - 11:30 am
Using age as a predictor is perhaps most straightforward.

There is a 1985 Clogg & Goodman paper in Soc Meth which discussed group differences in latent class analysis.
 anonymous posted on Thursday, January 12, 2006 - 5:23 am
Is it feasible to regard latent classes generated through LCA as subpopulations? In other words, I would like to use the 4 latent classe yielded in my LCA in a general SEM model, but rather than use the classes as predictors in this model, i would like to carry out multigroup comparisons, based on individuals' most likely class membership.
Is this possible, or does this result in estimation error?
or would it be better to include the actual class probablities, and not most likely membership, as predictors in a model?
 Linda K. Muthen posted on Thursday, January 12, 2006 - 8:40 am
It is not a good idea to use most likely class membership as a grouping variable. You will be introducing estimation error and your standard errors will not be correct. You could use the class probabilities as predictors, but it would be better to do the analysis simultaneously not in two steps.
 anonymous posted on Thursday, January 12, 2006 - 9:03 am
THanks for this. can you point me towards an example of how this is done in a single step, or can you suggest any paper/ reference which has applied this? many thanks
 Linda K. Muthen posted on Thursday, January 12, 2006 - 9:11 am
The way to have a latent class variable as a covariate, that is, to regress a dependent variable on the latent class variable, is to allow the means of the dependent variable to vary across classes.
 RJM posted on Tuesday, January 24, 2006 - 4:13 pm
I would like to estimate a multiple group (KNOWNCLASS = cg) hidden markov model with two classes at each occasion, testing the equality for the two groups of the probability matrix linking each latent class variable and its indicator. Is this possible?

I tried coding the model as follows but Mplus v3.13 returns an error that the class label is unknown.

[f$1] (111);
[f$1] (121);

[f$1] (211);
[f$1] (221);
 Linda K. Muthen posted on Tuesday, January 24, 2006 - 4:50 pm
Please send your input, data, output, and license number to
 Andy Ross posted on Tuesday, May 09, 2006 - 11:03 am
Dear Prof. Muthen

I am attempting to run a multi-group analysis comparing a latent class solution for two groups

In the first step I ran a four class solution for each group simultaneously using the KNOWNCLASS command, allowing both class and conditional probabilities to vary across groups:

TITLE: slca
DATA: FILE IS c:\slca;
VARIABLE: NAMES ARE pt re kd em hq g;
USEVARIABLES ARE pt re kd em hq;
CLASSES = cg(2) c(4);
KNOWNCLASS = cg (g = 1 g = 2);
CATEGORICAL = pt re kd em hq;
STARTS = 10 5;

MODEL: %overall%
c#1-c#3 on cg#1;

In the next step I wanted to run a restricted model in which the class probabilities are equal across groups (structural homogeneity). However I have not been able to set this up. I tried inputting the start thresholds for the first group, and constraining the conditional probabilities to be equal across groups using the following command syntax:

TITLE: slca
DATA: FILE IS c:\slca;
VARIABLE: NAMES ARE pt re kd em hq g;
USEVARIABLES ARE pt re kd em hq;
CLASSES = cg(2) c(4);
KNOWNCLASS = cg (g = 1 g = 2);
CATEGORICAL = pt re kd em hq;
STARTS = 10 5;

MODEL: %overall%
[pt$1*-4.1 pt$2*-2.7](p1);
[re$1*2.7 re$2*7.8](p2);
[kd$1*-3.5 kd$2*1.5](p3);
[em$1*0.3 em$2*1.5 em$3*3.7](p4);
[hq$1*-2.8 hq$2*0 hq$3*0.8](p5);
[pt$1*-0.7 pt$2*0.2](p6);
[re$1*2.2 re$2*12](p7);
[kd$1*3.1 kd$2*12](p8);
[em$1*2.8 em$2*3.1 em$3*3.2](p9);
[hq$1*-3.3 hq$2*-0.4 hq$3*0.3](p10);
[pt$1*-1.4 pt$2*-0.6](p11);
[re$1*-1.2 re$2*3.6](p12);
[kd$1*-3.2 kd$2*0.5](p13);
[em$1*-0.8 em$2*0.1 em$3*1.8](p14);
[hq$1*-0.5 hq$2*2.3 hq$3*3.2](p15);
[pt$1*4.0 pt$2*5.0](p16);
[re$1*-3.7 re$2*-0.9](p17);
[kd$1*3.8 kd$2*6.8](p18);
[em$1*0.8 em$2*1.0 em$3*1.1](p19);
[hq$1*-1.2 hq$2*0.7 hq$3*1.5](p20);
[pt$1*-4.1 pt$2*-2.7](p21);
[re$1*2.7 re$2*7.8](p22);
[kd$1*-3.5 kd$2*1.5](p23);
[em$1*0.3 em$2*1.5 em$3*3.7](p24);
[hq$1*-2.8 hq$2*0 hq$3*0.8](p25);
[pt$1*-0.7 pt$2*0.2](p26);
[re$1*2.2 re$2*12](p27);
[kd$1*3.1 kd$2*12](p28);
[em$1*2.8 em$2*3.1 em$3*3.2](p29);
[hq$1*-3.3 hq$2*-0.4 hq$3*0.3](p30);
[pt$1*-1.4 pt$2*-0.6](p31);
[re$1*-1.2 re$2*3.6](p32);
[kd$1*-3.2 kd$2*0.5](p33);
[em$1*-0.8 em$2*0.1 em$3*1.8](p34);
[hq$1*-0.5 hq$2*2.3 hq$3*3.2](p35);
[pt$1*4.0 pt$2*5.0](p36);
[re$1*-3.7 re$2*-0.9](p37);
[kd$1*3.8 kd$2*6.8](p38);
[em$1*0.8 em$2*1.0 em$3*1.1](p39);
[hq$1*-1.2 hq$2*0.7 hq$3*1.5](p40);


However this did not work. Could you please tell me how I can set up and run the structural homogeneity model for the above example?

Also, can I check, in order to run the next step in which I also restrict the class probabilities to be equal across groups (complete homogeneity) I simply run the original syntax, except for removing the model command:

MODEL: %overall%
c#1-c#3 on cg#1;

Is this correct?

Many thanks for your support

 Linda K. Muthen posted on Tuesday, May 09, 2006 - 11:36 am
You need to send your input, data, output, and license number to to get help on this.
 Khoun Bok Lee posted on Tuesday, October 02, 2007 - 5:44 am

I want to test the hypotheses that proportions of 2 classes in low educated group are same to proportions of 2 classes in high educated grop, usig education as a preditor.
However, I could not found correct commend to compair proportions of classes between 2 groups...
Is there any commend for my test?
Although I know the way to test my hyphotheses using a KNOWNGROUP model, the result of this model did not caculate 'df'(I don't know the reason). Because I want to statistical test using X^2 distribution, df must be needed.
many thanks
 Linda K. Muthen posted on Tuesday, October 02, 2007 - 8:59 am
Instead of using the education variable as a grouping variable, use it as a covariate and regress the categorical latent variable on it using the ON option of the MODEL command.
 Lannie Ligthart posted on Thursday, April 10, 2008 - 2:49 am
I would like to test group differences in 3-class LCA profiles across two variables: sex and affection status for a disorder.
I tried to do this using the KNOWNCLASS option, by creating 4 groups: male/unaffected, female/unaffected, male/affected, female/affected, and then equating the response probabilities step by step, starting with group 1 and 3 vs. 2 and 4 etc., and comparing the BICs for these models.

I coded this as follows:

KNOWNCLASS = cg (group=1 group=2 group=3 group=4);
classes = cg(4) c(3) ;

and then:

Model: %OVERALL%
C#1 ON cg#1;
C#2 ON cg#1;
C#1 ON cg#2;
C#2 ON cg#2;
C#1 ON cg#3;
C#2 ON cg#3;

I have two questions:
1) Did I specify this model correctly (I have never seen any scripts using more than 2 groups)?
2) Is it a valid approach to create four groups the way I did, or is there a better way to do this?
 Linda K. Muthen posted on Thursday, April 10, 2008 - 9:15 am
1. It looks correct. In Version 5, you can simply say c ON cg;
2. I would use -2 times the loglikelihood difference not BIC.
 C. Sullivan posted on Thursday, July 24, 2008 - 6:00 am
Hi, I'm trying to conduct a multigroup lca using "knownclass" (adstat) and while I can run a model constrained in terms of conditional item probabilities, I'm having difficulty holding the lc probabilities to be equal. Any advice on how to constrain those lc probabilities would be much appreciated. This is the input that I have so far.

drgcrm#1 on adstat#1;

[coc$1] (2);
[op$1] (3);
[pcp$1] (4);
[mj$1] (5);
[coc$1] (6);
[op$1] (7);
[pcp$1] (8);
[mj$1] (9);

[coc$1] (2);
[op$1] (3);
[pcp$1] (4);
[mj$1] (5);
[coc$1] (6);
[op$1] (7);
[pcp$1] (8);
[mj$1] (9);
 Linda K. Muthen posted on Thursday, July 24, 2008 - 9:11 am
Try removing the statement:

drgcrm#1 on adstat#1;

If you continue to have problems, send your files and license number to
 James Swartz posted on Sunday, March 22, 2009 - 6:14 pm
Sorry for this very simple question, but when comparing a restricted versus unrestricted LCA model with known classes, do you compare the statistics printed as the loglikelihood values or the likelihood ratio chi-squares...

 Linda K. Muthen posted on Sunday, March 22, 2009 - 9:39 pm
To test nested LCA models, the regular loglikelihood values are compared. -2 times the loglikelihood difference is used.
 davide morselli posted on Monday, February 08, 2010 - 6:52 am
I'm comparing latent class structures between groups,I've performed the analysis separately on the two samples and results are that a 3 three class model is good for group 1; while for group 2 a 4 class model is preferable. Since classes from 1 to 3 are in effect similar in both groups (e.g., each class is defined in the same way by the same items), can I specify a model that consider at the same time the measurement equivalence between the class from 1 to 3 and the fact that group 2 has 1 additional class?
 Linda K. Muthen posted on Monday, February 08, 2010 - 8:18 am
That seems to be a reasonable approach.
 davide morselli posted on Monday, February 08, 2010 - 9:04 am
ok, but how can I specify the model?
if I constrain means of class 4 of group 1 (cg#1.c#4) to be equal to zero I have 0 subjects in the models with no other constraints but 114 subjects in the model where I constrain measurement invariance across groups. Do I have to constraint some other parameter?

the syntax is:
CLASSES = cg (2) c(4) ;
KNOWNCLASS = cg (country = 1 country = 2);
STARTS = 500 50;
c ON cg ;

 Linda K. Muthen posted on Monday, February 08, 2010 - 11:01 am
In overall, for the class you want to be the empty class, for example, class 1, specify:

c#1 ON cg#1@-15;

For further support, please send your question and license number to
 Maria Guadalupe posted on Thursday, April 15, 2010 - 8:23 am
Hello! I am wanting to do a LCA with multiple groups. I have 4 groups, and for 3 of the 4, a 2 class solution fits the data best, but for one class a 3 class solution fits best. Is there a way to free the number of classes for that one group with 3 classes, or some other way to handle this situation?

 Bengt O. Muthen posted on Friday, April 16, 2010 - 10:30 am
It is not easy to work with different number of classes in different groups. Instead, you could investigate the 3-class solution in all groups - perhaps in the 3 groups where 2 classes fit best the 3-class solution is just a minor extension of the 2-class theme.
 Christian M. Connell posted on Friday, April 23, 2010 - 1:28 pm
I am working on a model similar to those described above (gender comparison of latent classes based upon 13 binary indicators) using the knownclass approach. We have been able to run models that freely estimate item-response probabilities across class by gender and also to run models in which the item-response probabilities are restricted within class by gender (i.e., males and females in a given class are restricted to have the same item-response probabilities).

Where I am having some difficulty is in restricting the class probabilities (i.e., prevalence of each class) to be the same across gender. I have removed the regression of latent classes on the knownclass from the overall statement (as suggested previously), but the class probabilites still differ by gender.
 Bengt O. Muthen posted on Saturday, April 24, 2010 - 12:56 pm
Please send your input, output, data, and license number to so we can see your exact setup.
 F Lamers posted on Friday, October 22, 2010 - 1:32 pm
I’m trying to run an LCA using the KNOWNCLASS command to evaluate whether profiles are similar between two groups. First, I ran an unrestricted model where class and conditional probabilities are allowed to vary across groups:
VARIABLE: NAMES ARE sampleid d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 group;
CATEGORICAL= d1 d2 d3 d4 d5 d6 d7 d8 d9 d10;
MISSING= all (-1234);
CLASSES= cg (2) c(2);
KNOWNCLASS = cg (group=0 group=1);
STARTS= 400 100 ;
c on cg;

Now, I would like to equalize the response probabilities in the classes between the two groups (and then compare the models with a -2 loglikelihood test). I’ve been trying to write the input for this, but I’m not sure if I’m doing things the right way.

c on cg;

[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (1-14);
[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (15-28);

[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (1-14);
[d1$1 d2$1 d3$1 d3$2 d4$1 d4$2 d5$1 d5$2 d6$1 d6$2 d7$1 d8$1 d9$1 d10$1] (15-28);

Is this the right way?
 Linda K. Muthen posted on Sunday, October 24, 2010 - 9:04 am
That looks correct. The best way to check is to run it and see if you get what you expect.
 Jennifer Buckley posted on Wednesday, February 02, 2011 - 2:34 am
I am trying to run a similar model to the one above, where response probabilities are equalized across different groups, and I have the following queries:

1) when the dependent variables are categorical, how many constraints do I need? Is it the number of categories minus 1?

2) Do the groups need to be the same size when doing multiple group analysis with knownclass?
 Linda K. Muthen posted on Wednesday, February 02, 2011 - 6:48 am
1. The number of thresholds for a categorical variable is the number of categories minus one.

2. No.
 J.D. Haltigan posted on Friday, May 06, 2011 - 7:22 pm

I have a similar question that has been posed above but I am not sure if I am interpreting my output correctly.

I have a 4 class model (generated from 7 binary latent class indicators) and the substantive question I want to ask is whether membership in these classes is the same for males and females (gender as covariate).

I have regressed the categorical latent variable on gender using:

C#1 on sex;

However, I am not sure how to interpret the estimates. Specifically the overall C#1 on sex estimate and then the 3 intercept estimates (C#1, C#2, C#3). Any insight would be greatly appreciated.

Also, is there a separate plot command to generate the plot for relationship between class probabilities and covariate (as opposed to the item probabilities for each class)?
 Linda K. Muthen posted on Saturday, May 07, 2011 - 10:38 am
You should use the specification c ON x. For this multinomial logistic regression with four classes, you will obtain three regression coefficients and three intercepts. The intercepts are used along with the regresson coefficients to compute probabilities. See pages 443-445 of the Version 6 user's guide.

As far as the regression coefficients, the last class is the reference class. Let's say for a continuous covariate x in class 1, the regression coefficient for x is positive. The interpretation is that as x increases, the log odds increases for those in class 1 compared to the reference class.
 J.D. Haltigan posted on Monday, May 09, 2011 - 1:58 pm
Thanks! Is it the case that if one uses algorithm integration (to regress the covariate on the latent classes) that plots for the relationship between class probabilities and the covariate can not be generated? I have seen examples of these plots and would like to be able to see them for my data, but I can not run the {above} model without using algorithm integration.
 Bengt O. Muthen posted on Monday, May 09, 2011 - 4:30 pm
I think that is true. Do you really need algorithm = integration? (Also, c ON x doesn't mean to regress the covariate on the latent classes, but the other way around.)
 J.D. Haltigan posted on Monday, May 09, 2011 - 10:40 pm
Thanks and thanks for the language clarifiation. I worked with my syntax and was able to get the plots of probabilities for the classes as a function of its covariate (I was able to run it without algorithm=integration).

One conceptual question: given that the covariate is not assumed independent amongst the classes, what is the usual justification for including a covariate in class generation? In other words, if there is a theoretical basis for including the covariate in the model, but the logistic regression coefficients are not significant does it make sense to NOT include the covariate in subsequent latent class generation?

Relatedly, the approach I have been taking with my data is to use the classes to ascertain or predict differences on various relevant antecedent and sequalae variables (e.g., ANOVA/logistic regression). But does it make more sense to use variables that might explain the classes as covariates in the model (aside from the binary behavioral indicators of the phenomenon)?
 Bengt O. Muthen posted on Tuesday, May 10, 2011 - 10:26 am
Answer to your 1st question: Yes.

Answer to your 2nd question:

If you think of certain variables as antecedents to the latent class variable, I would include them as covariates ("c ON x"). Then you can also see how the covariate means change over the classes. This is different from having these variables as indicators of the latent classes because there is no assumption of conditional independence among covariates,
 J.D. Haltigan posted on Thursday, May 12, 2011 - 12:43 am
Thanks again for the helpful remarks. I am still in the process of fully grasping certain aspects of the conceptual aspects of the LCA method.

One further point of clarification on the above: Is it the case that I can not compare models (same # classes) with and without the covariates included? Although my BIC and adjusted BIC values are more favorable with the covariates included, their estimates are not significant. That said, I am not certain such a comparison is tenable (i.e., one can only compare the same model in terms of adjudicating different class sizes).
 Bengt O. Muthen posted on Thursday, May 12, 2011 - 8:10 am
Because the likelihood is computed for outcomes conditional on covariates, you can compare BIC between models with and without covariates as long as the models have the same outcomes. But if none of the covariates are significant, why include them? Note that you can test joint significance of the covariates by Model Test testing if all their slopes are zero - instead of looking at the z score for each slope separately. I'm not sure, but I think this is what you were asking.
 J.D. Haltigan posted on Thursday, May 12, 2011 - 4:57 pm
Just to be sure of myself, when you use outcomes you mean classes correct? Yes, since the covariates are not significant, I likely will not include them in the model. The somewhat odd result I am having trouble grasping is that with the same indicators a 4-class solution is best without the covariates but a 3-class solution seems best with them (indeed, model estimates are more favorable for the 3-class with covariates than without). Substantive theory in the area I am working in would point to either a 3 or 4 class solution, so in that sense it is not a problem.

One thought I had was that, even though the covariates in the 3-group solution are not significant (risk index comes close), the resultant classes (3) may have better predictive yield for the substantive question I am looking to address with the classes (i.e., unique predictive correlates). Is this a reasonable strategy to approach the issue from?
 Bengt O. Muthen posted on Thursday, May 12, 2011 - 6:50 pm
I actually meant the observed outcomes being the same - which I assume you have in your case.

The topic of deciding on the number of classes with or without covariates has been studied by several authors - you may want to email Katherine Masyn at Harvard School of Ed for example. Some covariate may have direct effects on the observed outcomes which makes things more complex; in that case you should include the direct effect. See also my 2004 chapter on GMM in the Kaplan handbook.

If you have theoretical reasons for a set of covariate influencing the latent class variables, I would not be against reporting results from such an analysis even if none of the covariates turns out to be significant.
 J.D. Haltigan posted on Thursday, May 12, 2011 - 7:58 pm
Dr. Muthen thanks again this is very helpful. If I am correct in that observed outcomes = observed latent indicators then yes they were the same 7 for each model.

Given that a 4-class model emerged without the covariates, but a 3-class model emerged with them (albeit they were non-significant) it must still be the case that group membership has changed (4 to 3 classes etc.) as a function of the inclusion of the covariate? Or am I misunderstanding? I have not utilized the three-group solution (classes) yet in analyses but I am anxious to see if they show a different pattern of results (in terms of their relationship to subsequent variables).

One thing that may be limiting the model is that the N (cases) is 177. This seems rather low based on my reading of most LCA literature. However, given the somewhat rarity of the behavioral phenotype of interest, 177 is probably the largest sample on which this type of analyses will likely be conducted--at least at present.
 Bengt O. Muthen posted on Monday, May 16, 2011 - 9:04 am
If your 3-class model with covariates has some significant direct effects from some covariates to some observed latent class indicators, it may be that is has a worse BIC than a 4-class model. So the group membership may not change when the modeling with covariates is done in more depth.

A sample less than 200 can make BIC underestimate the number of classes - see the article on our web site:

Nylund, K.L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.
 J.D. Haltigan posted on Wednesday, May 18, 2011 - 12:55 am
Thanks again, I read over the Nylund piece in detail and it was quite helpful.

One thing that a number of colleagues have broached with me is that it may be circular to claim that, in my case for example, that a given class shows associations with a covariate (e.g., maternal depression) if that covariate is informing the class membership in the first place (i.e., their position is that is more plausible to just use the latent class indicators to inform class enumeration and then test for differences on antecedents in a separate step). I have received this critique a number of times and I am curious as to what is the advantage of using covariates in the LCA modeling process (i.e., what is the rebuttal to this claim)?
 Bengt O. Muthen posted on Wednesday, May 18, 2011 - 9:16 am
That's a big topic. I think it all depends on the application situation. It is interesting that this comes up often in LCA but not in factor analysis. A factor model with factors regressed on covariates (a MIMIC model) typically would not be questioned - the covariates contribute information to the formation of the factor scores so you include them. Although there are exceptions - take the MIMIC model used by ETS to score student performance. On the individual level using only the indicators makes sense - you wouldn't want to have say gender influence your factor score, only your performance. But on the group level, covariates improve the scores.

In your case, if you want to show how strongly a covariate predicts class membership, you do that best in a single run including the covariate. Doing it in several steps (without covariates, classify, regress class on covariates) has its own complications. However, I can see an argument for not using the covariates when deciding on the classes. For instance, if covariates are genotypes and the indicators are phenotypes - here you want to define the phenotype without using the genetic information, and then see the strength of relationship.
 J.D. Haltigan posted on Friday, May 20, 2011 - 12:26 am
Thanks, Dr. Muthen. Would one include a direct effect of a covariate on a given indicator if they had a priori theory or evidence that the covariate influences, say, one of the X indicators? This is certainly not the case for me (i.e., I only suspect that the chosen covariates will in some way influence class membership--I have no prior evidence to suggest that they are associated with a given indicator) but I was curious as to when one would include such a direct effect of a covariate on an indicator (in addition to regressing the class on the covariate).
 Bengt O. Muthen posted on Saturday, May 21, 2011 - 9:52 am
You would do it either by theory, or when less is known, simply by regressing one indicator at a time on all covariates in an exploratory way to see which direct effects are significant.
 J.D. Haltigan posted on Friday, December 02, 2011 - 1:18 pm
Returning again to this thread to make sure I am interpreting some things correctly...Just finished reading some of Collins and Lanza (2010) and had a few questions.

If I fit an LCA model with 5 covariates (simultaneously) is it the case that the odds-ratios will tell me the increase in probability of a given class membership for a one-unit increase in a given covariate but that in order to test whether a covariate is significant, I would need to run separate models with and without a given covariate and perform the chi-square test of model fit?

Also, I have not standardized the covarites prior to entry into the model..I realize this will not affect the results other than easing the conceptual part, but I did find out that to obtain standardized parameters for this type of model, numerical integration is required. I was just curious why this (numerical integration for standardized coffecients) was the case.
 Bengt O. Muthen posted on Friday, December 02, 2011 - 2:12 pm
The printed z tests (Est/SE) for the coefficient of each covariate gives the test of whether the covariate has a significant influence.

I don't see how the request for standardized calls for numerical integration - better send to support to see the full picture.
 J.D. Haltigan posted on Friday, December 02, 2011 - 10:52 pm
Thanks Dr. Muthen. This was my original understanding and I got a bit turned around after reading Chapter 6 of Collins and Lanza. They note that in an LCA with covariates the latent class prevalences are expressed as functions of the regression coefficients and individuals' values on the corresponding covariates. I get this. They then mention that hypothesis testing in LCA with covariates is done by means of a likelihood ratio chi-square test. This is where I became a bit confused. By hypothesis testing I am now understanding the authors to mean testing competing models (i.e., one with a covariate and one without). So, if there is not a significant improvement in model fit with the addition of the covariate (even though it may have a significant odds-ratio estimate on a given class relative to the reference class) how does one evaluate the covariate? I guess since in this case I am using a hypothetical example of one covariate, if it has a signifcant odd-ratio estimate, there would have to be improvement in class prediction relative to a baseline model with no covariate?

To be more clear, in my example of a 5-covariate model, 2 of the 5 covariates had significant odds-ratio estimates depending on the reference class (theory-expected). I can say that these covariates significantly influence class membership correct? There is no need, say, to compare to a baseline model without these covariates?
 Bengt O. Muthen posted on Saturday, December 03, 2011 - 4:49 pm
You can test the significance of a covariate by the z test for its slope that Mplus prints. You can also run 2 models, one with the slope free and one with the slope fixed at zero. 2 times the loglikelihood difference for these runs is chi-square which is z-squared - so these two tests should agree. To test more than one covariate having zero slopes you can still do the likelihood difference testing - I think that's the testing mentioned in the book.
 Stata posted on Monday, April 30, 2012 - 10:30 am
Dr. Muthen,

I'd like to confirm with you regarding Nylund et al (2007) mentioned "the commonly used log likelihood difference test cannot be used to test nested latent class models". Does it mean the loglikelihood (Ho in Tech 8) value cannot be used for LCA comparison or does it mean something else (Tech 11)?

Thank you.
 Linda K. Muthen posted on Monday, April 30, 2012 - 2:00 pm
If means that to decide on the number of classes, you should not use loglikelihood difference tests. You should instead use BIC, TEHC11, TECH14 etc.
 Tracy Witte posted on Thursday, October 17, 2013 - 4:57 am
I am running a 3-group LCA and have used the 3-step procedure for testing equality of means in auxiliary variables across classes. I would like to provide effect sizes for differences in means on the auxiliary variables across the classes. Would it be appropriate to convert the SE's to SD's and then calculate Cohen's d from the means and standard deviations? Or, is there some other, preferred approach for determining effect sizes?
 Bengt O. Muthen posted on Thursday, October 17, 2013 - 6:26 am
Yes, that would be appropriate.
 Jonathan Larson posted on Wednesday, February 19, 2014 - 11:37 am
We have an LTA that displays measurement non-invariance over time. However, upon visual inspection, one of the classes appears relatively unchanged. Is there a way to compare item probabilities over time to see which ones differ and which don't?

I know there are ways to compare proportions in independent and matched samples, but this class at two time-points doesn't classify as independent or matched because it has some of the same people and some different people.

Thank you!
 Bengt O. Muthen posted on Wednesday, February 19, 2014 - 11:47 am
You can test invariance one item at a time.
 Jonathan Larson posted on Wednesday, February 19, 2014 - 2:38 pm
Do you mean by using the likelihood ratio test to compare models (i.e., two times the difference in log-likelihoods)?

If so, how do you interpret the case where the two models are not different from each other? I imagine that if the constrained model fits better, the item probability does not change over time, and if the free model fits better, the item probability changes over time. However, I wouldn't know how to interpret a non-significant result.

Thank you!
 Bengt O. Muthen posted on Thursday, February 20, 2014 - 2:09 pm
Yes, an LR chi-square test.

If this test does not give significance we can't reject equal parameters, so we take them to be invariant.
 Jonathan Larson posted on Friday, February 21, 2014 - 7:29 am
Thank you!
 Wen, Fur-Hsing posted on Monday, March 24, 2014 - 7:08 pm
I try to run an LCA using the KNOWNCLASS command to evaluate whether profiles are similar between two groups. I have 4 classes based on the observed items. If I want to test some specific classes, e.g. c#2 and c#4 based on result, whether the conditional probabilities are equal between the two groups, how do I specify the scripts?
 Linda K. Muthen posted on Tuesday, March 25, 2014 - 1:33 pm
You can use MODEL CONSTRAINT to test if the thresholds or probabilities are equal across classes, for example,

[u$1 u$2];
[u$1] (p1);
[u$1] (p2);

NEW (diff);
diff = p1 - p2;

To test the probabilities, you need to create them in MODEL CONSTRAINT using the thresholds.
 Sara Anderson posted on Tuesday, May 27, 2014 - 11:27 am
I am having trouble interpreting my LCA multi-group output. In my first multigroup model (with two groups, 5 classes), I am not finding similar patterns of probabilities for the classes across groups. I had presumed that even though classes can vary across groups, that the patterns for the 1st, 2nd, etc., classes should be roughly similar. Is that not the case? Are classes just put in different order for each group?

Similarly, when I constrain thresholds to be equal within one class, I am not finding that the paths are actually the same for the same class across the two groups (although the Wald test is significant). Why would that be the case? I thought that I should see the thresholds as being identical? Is it because the groups are different sizes?
 Bengt O. Muthen posted on Tuesday, May 27, 2014 - 6:24 pm
I assume that you are using 2 latent class variables and that one of them is specified as Knownclass. If so, the unknown classes can come out in different order for the 2 known classes. But the mean/probability profiles will tell you which class should be comparable to which.

For the question in your second paragraph we would need to see your output - send that to Support with your license number - and point out what is not equal that you expected to be equal.
 Sara Anderson posted on Tuesday, May 27, 2014 - 6:49 pm
Just to clarify - with the knownclass option, the classes could show up in different order across the known groups. So, 1 1 might not be the same class as 2 1 in the output. I need to use the predicted probabilities to match them?

Also, what if I find that the overall predicted N for the classes is different with the Knownclass option as it was for the initial model? Shouldn't they be the same or can they shift once you can disaggregate by groups?

I figured out part two. Thanks-
 Bengt O. Muthen posted on Wednesday, May 28, 2014 - 2:11 pm
Q1. Yes.

Q2. They can shift.
 Seung Bin Cho posted on Friday, September 05, 2014 - 7:53 am
Dear, Dr. Muthen.

I have questions on testing measurement invariance in multiple group LCA.
I have fitted two models with binary indicators: one model without any restriction on thresholds and another model with equality constraints on all thresholds between sex.

1) Can I use difference of -2*H0 Log likelihood between the two models to test thresholds equality between sex?

2) Alternatively, I attempted to test thresholds equality between sex using model test option. I have manually checked the order of classes and selected pairs to test that I think were appropriate. Optseed was used to fix the order. Is this way correct?

3) Test results were different between the two methods above. Test was insignificant from 1) (p=.31) but highly significant from 2) (p<.001) . I think the method 1) is more reasonable mainly because 1. there still is arbitriness in selecting pairs to testin method 2), 2. I'm not sure if the methods are equivalent, and 3. sample sizes were different between sex.

4) Is there better way to test the measurement invariance?

Thank you for your help!
 Linda K. Muthen posted on Friday, September 05, 2014 - 9:34 am
See the Version 7.1 Language Addendum on the website with the user's guide under convenience features for multiple group analysis. This shows the models to use for testing measurement invariance and convenience features that can help you do this.
 Seung Bin Cho posted on Wednesday, September 10, 2014 - 7:55 am
Thank you for your response, Dr. Muthen.

I think the relevant part of 7.1 language addendum is the last part about Knownclass and Do option.
As far as I understand, what I did using Model test command was equivalent with what's described in the document, and using Do option is a convenience feature that helps user do the same thing more easily.
I'm still wondering whether my first question - using -2*H0 Log likielihood difference - is true, because I still think this is a better way of testing threshold invariance for the reasons I described in my third question above.

Thank you for your help!
 Seung Bin Cho posted on Wednesday, September 10, 2014 - 8:10 am
I forgot to mention this part.

Is this page also relevant as I used MLR?

Thank you!
 Linda K. Muthen posted on Wednesday, September 10, 2014 - 8:46 am

Multiple group factor analysis: Convenience features

This has convenience features for testing measurement invariance. I think this is what you want to do.
 Seung Bin Cho posted on Wednesday, September 10, 2014 - 9:07 am
Thank you Dr. Muthen.

I found the section and still wonder if it's relevant with LCA because it says
"It is available for CFA and ESEM models for continuous variables with the maximum likelihood and Bayes estimators..."

I'm running LCA using binary indicators. My main question is whether it's appropriate to use -2*H0 Log likelihood from the output to test threshold invariance. I'm using MLR estimator. In other words, I wonder if the difference of -2*H0 log likelihood between nested models follows chi-squared distribution. I assumed it follows chi-squared distribution based on my previous search on this forum.

I'd also appreciate if you could recommend relevant readings.
Thank you for your help.
 Bengt O. Muthen posted on Wednesday, September 10, 2014 - 12:55 pm
Yes, the LRT test is correct to use for threshold invariance. I don't know about references; there may be in the LTA context.
 db40 posted on Monday, April 06, 2015 - 2:33 pm
Hi, I'm just trying new things and I'm wondering if I can use Knownclass to examine a 5 class solution by gender?

Thank you.
 Linda K. Muthen posted on Monday, April 06, 2015 - 3:48 pm
Yes, you can do that.
 devin terhune posted on Friday, April 10, 2015 - 10:06 am
Hi. I have two data sets. I get a 4-class solution separately in each sample as well as when I combine the samples. I'd like to get a better sense of whether the class solutions differ across samples and/or whether the solution is present when controlling for sample. I tried to run the analysis with group included using the KNOWNCLASS option. This is my code:

CLASSES = cg (2) c (2);
KNOWNCLASS = cg (g = 0 g = 1);
c ON CG;

I get the following error (for all categorial vars):

Variances for categorical outcomes can only be specified using PARAMETERIZATION=THETA with estimators WLS, WLSM, or WLSMV. Variance given for: P1

It seems to be related to the fact that 10 of my variables are binary. When I remove the line about categorical variables, the analysis seems to work, but I get a small error (too big to reproduce here). Am I able to include the binary variables here? Also, why can't I run TECH11 and TECH14 here (I will use this to contrast the different models)?

Is this right approach for what I'm trying to do?

 Bengt O. Muthen posted on Friday, April 10, 2015 - 4:01 pm
Your errors stem from the variance statements:


Apart from there being no variance parameters for categorical items, I don't know why you say this. Do you mean to use brackets here: [p1-p20]? If you do, then you are saying that there is not measurement invariance across Knownclasses.
 devin terhune posted on Monday, April 13, 2015 - 3:57 am
Thanks for your quick and helpful response.

I used the variance statements from another example I found but evidently that wasn't correct.

So, to reiterate:
1. I have two samples of data
2. When I run the LPA separately, I get vaguely similar pattern of results.
3. When I run the LPA on the combined data sets, I get a set of results that converge with other research, but the samples are disproportionately represented across the different classes.

I'd like to get a better sense of the extent to which the results are independent of the sample. I don't have any strong predictions regarding how the two samples would differ however.

Is including sample as the knownclass variable worthwhile? Could you provide an example from the manual that I could use as a guide with regard to the covariance constraints? Alternatively, would it be better to include sample as a covariate?

Thanks for your help!
 Bengt O. Muthen posted on Monday, April 13, 2015 - 7:47 am
Look at UG ex 8.8 which uses an approach where you let the outcome means vary across the cross-classification of the 2 latent class variables. That's the most flexible LPA (with categorical outcomes there are no variance modeling to consider). You can then see how the means change over those cross-classifications. In that model you can also test measurement invariance in 2 ways. One, make different kinds of restrictions on the means to test invariance by comparing loglikelihoods resulting in likelihood-ratio chi-2 testing. Or, apply Wald testing using Model Test.
 Alvin  posted on Monday, May 04, 2015 - 4:56 pm
HI Prof Muthen, I am working on a three-class multigroup LCA. While I couldn't get full invariance in response probabilities, I was able to get partial invariance after letting the response probabilities of class 2 vary by 2 groups. This was done based on significant differences in probabilities of some of the items within class 2 across groups, while class 1 and class 3 are relatively homogenous in terms of probabilities of endorsement. Next, I am going to constrain class distribution to be equal across groups. Do you constrain the means to be equal, or? Kankaras recommends BIC as a key indicator to compare nested models (rather than LRT which is subject to sample size), is this acceptable? I notice in the case of complete homogeneity, there is no interaction effect between group and class, is this the model of complete invariance?
 Bengt O. Muthen posted on Monday, May 04, 2015 - 5:32 pm
Q1. Yes.

Q2. BIC is good. But since it is a function of logL it is also influenced by sample size.

Q3. I don't know how one looks at an interaction between group and class.
 Will Hernandez posted on Wednesday, May 06, 2015 - 8:49 am

I see above that you stated differing group sizes is not an issue when doing multigroup comparisons using KNOWNCLASS. Could you explain what MPlus does to adjust for the different sample sizes?

Also, I am conducting a multigroup analysis using the KNOWNCLASS option, with 8 groups and wanted to compare coefficients across these groups (for a total of 28 possible comparisons). I have used MODEL TEST to produce a Wald Test. I have just changed the coefficients to be compared and rerun the syntax to produce Wald tests for each of the comparisons. I wanted to know if MPlus makes any corrections for multiple comparisons such as this or if something like a Bonferroni correction is required (This would be less than ideal considering the 28 comparisons being made).

Thank you for any assistance!

 Bengt O. Muthen posted on Wednesday, May 06, 2015 - 3:08 pm
I don't think that multi-group and knownclass are any different with respect to different group sizes.

No, Mplus does not make adjustments for multiple testing. The analyst has to do that.
 Chee Wee Koh posted on Thursday, February 25, 2016 - 8:28 pm
Hi there,

Following up on Bengt's response to Devin on Apr 13 above, I tested MI in my model using UG e.g. 8.8 as reference (mine is LPA, not GMM). I'd like to check if I have done it correctly.

DATA: FILE IS wvdat2.dat;
CLASSES = cg (2) c (4);
KNOWNCLASS = cg (g = 0 g = 1);
u1 ON u2;
c ON cg;

[u1] (1);
![u2] (5);
![u1] (2);
![u2] (6);
[u1] (1);
![u2] (5);
![u1] (4);
![u2] (8);

As I progressively fixed means of u1 to be equal across the two known-classes, the LL changed very little; however, when I then proceeded to fix means of u2, the LL increased drastically (almost doubled and much higher than when no mean was fixed).

1. Does this imply there is no-MI?
2. Have I done the MI test correctly?

Thank you.
 Bengt O. Muthen posted on Friday, February 26, 2016 - 2:36 pm
Please send your output to Support along with your license number.
 Chee Wee Koh posted on Thursday, March 10, 2016 - 1:50 pm

I am following up on my post above (the syntax in the post had an error which has since been fixed). I am interested in establishing measurement invariance across male and female data.

First, I conducted LPA on male and female data separately. There were 4 profiles in each group which appeared similar across the 2 groups.

I specified a model where all parameters were freed. Then I specified another model where all indicator means in group 1 were freed and corresponding indicator means in group 2 were equal to those in group 1. I used TECH1 to track parameter estimation.

I computed -2(LL diff) to compare the two models and the chi-square was not significant. It appears, however, that class proportions differed in the two groups.

I like to verify whether I have grasped the implications of the results:

1. Have the analyses sufficiently established that gender has no direct effect on any of the profile indicators?

2. Do the results imply that class-specific response probabilities do not differ between males and females?

3. Can I pool the male and female data for further analysis and specifying gender as a covariate affecting latent class only?

4. If the answers to the above are 'yes', then why did I not have to constrain factor loading across groups like how it is done to show scalar invariance in CFA?

Thank you!
 Chee Wee Koh posted on Thursday, March 10, 2016 - 2:53 pm
Sorry, for (4), I meant metric equivalence.
 Bengt O. Muthen posted on Thursday, March 10, 2016 - 6:06 pm
1-3: Yes.

4. There are no loadings when the latent variable is categorical.
 Chee Wee Koh posted on Thursday, March 10, 2016 - 10:53 pm
Oh that's right! Thank you!
So, when we specify the cross-classification model, and constrain all measurement parameters in profiles to be equal across groups to test structural equivalence, we have essentially also fixed the class by known-class interaction to zero, and thereby ensuring metric equivalence. Is this interpretation correct?
 Bengt O. Muthen posted on Friday, March 11, 2016 - 5:46 pm
 Ali posted on Tuesday, October 11, 2016 - 5:11 am
I am using the LCA model with four nominal variables for 7 countries. 6 out of 7 countries have 3 classes,but among 6 countries, two countries have different interpretations as other countries. And, 1 out of 7 countries has 2 classes. When I put the whole countries together, I have 3 class solution.

My purpose is to know a typology of the use of learning strategies for each country, but now I have different number of latent class across countries. So, I don't if it violates measurement invariance and it makes interpretation reasonable.
 Bengt O. Muthen posted on Tuesday, October 11, 2016 - 12:00 pm
With these country difference I would just analyze each country separately and report the similarities and differences as you have here.
 'Alim Beveridge posted on Friday, November 25, 2016 - 6:37 am
Dear Bengt and Linda,
I am conducting LCA on data about companies. There are 2 groups in my sample (say public and private) and I want to know if the best solution in terms of number of classes is the same for the 2 groups (sub-samples). I plan to do the LCA on each sub-sample separately. Is that the right way or is there a better way?

My understanding of KNOWNCLASS is that it will create the same number of classes within the 2 groups and that its purpose is to see if the means and thresholds are the same or different across the 2 groups (so it's not useful to me at this point). Is that correct?
 Bengt O. Muthen posted on Friday, November 25, 2016 - 11:03 am
Q1. Since no parameters are held equal across the groups there is no benefit to analyzing them together.

Q2. Yes. Knownclass (for cg say) can also have c on cg which means that the class percentages can vary as a function of cg.
 Jordan davis  posted on Saturday, January 21, 2017 - 11:50 am
Hi Dr. Muthen,
I'm wondering if it is possible to test mean differences for distal outcomes (the BCH method described in your webnote) across multiple groups.

We have a 3 class solution for our LCA and wanted to look at Time 2 distal outcomes stratified by Sex (males and females). Is this possible in Mplus yet?

 Bengt O. Muthen posted on Saturday, January 21, 2017 - 2:10 pm
You can let sex be a covariate and use the approach in section 3.2 of web note 21.
 Jordan davis  posted on Sunday, January 29, 2017 - 9:57 am
Thanks Dr. Muthen,
A couple of clarifying questions after reading web note 21.

1. I don't see mention of setting logits for the classes. I thought this was the most up to date practice. example:


2. Example 3.2. is this suggesting I use Female as my X variable to predict my distal outcome BY class?



SHVICT6 on Female;

3. If I do set logits as noted above does this have implications for interpretation?

4. Are results interprted as a groups analysis? if so, should we test equality across the regression coefficients?

 Bengt O. Muthen posted on Sunday, January 29, 2017 - 3:14 pm
You should read our first paper on this. See Recent papers on our website:

Asparouhov, T. & Muthén, B. (2014). Auxiliary variables in mixture modeling: Three-step approaches using Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 329-341. The posted version corrects several typos in the published version. An earlier version of this paper was posted as web note 15. Download appendices with Mplus scripts.
 Ali posted on Tuesday, February 28, 2017 - 5:32 am
I am comparing latent class structure between countries. First, I analyzed the two samples separately, and country 1 had a 3-class model, while country 2 had a 2-class model. When I interpreted the latent classes, both countries only had one similar interpretation. That means class 1 had similar interpretation across country 1 and country 2, but class 2 and class 3 in country 1 had different interpretations with class 2 in country 2. Second, I conducted the LCA on the whole sample, combined two countries data together. The results showed a 3-class model fitted better based on BIC. Finally, I conducted the multiple LCA analysis with 3 classes. When I checked the estimated parameters, I found the 3rd class in country 2 could be a minor extension of the 2nd class in country 2. So, now, I only have one similar interpretation in both countries when I did the multiple group LCA analysis. So, how could I do measurement invariance across two countries with the multi group LCA analysis?
 Bengt O. Muthen posted on Tuesday, February 28, 2017 - 6:17 pm
You could either

Use 3 classes and apply measurement invariance for only the one class

Use 2 classes as an approximation and apply measurement invariance for both classes

You may want to discuss this further on SEMNET.
 Ann Nguyen posted on Saturday, May 20, 2017 - 4:23 am

I would like to run a latent class regression analysis stratified by 3 age groups using the manual 3 step method. I am interested in testing how the relationship between a set predictors and the latent class variable (2 classes) might vary across age groups (variable name = agegroup). In step 3, I used the KNOWNCLASS option and the following syntax in the Model command:

c on sex educ;
c on agegroup;

Model c:


Model agegroup:
c on sex educ;
c on sex educ;
c on sex educ;

With this syntax, I receive the following message:


Am I receiving this message because there is an error in my syntax or the sample size for agegroup 3, class 1 is too small?
 Bengt O. Muthen posted on Sunday, May 21, 2017 - 12:59 pm
To answer this we need to see your full output. Send to Support along with your license number.
 Eric Finegood posted on Friday, August 11, 2017 - 8:46 am
Dear Drs. Muthén, I am using the new BCH method to conduct an LCA distal outcomes analysis. My primary research interest after enumerating the latent class model is to explore whether the correlation between a distal outcome (z) and a covariate of interest (x1) varies as a function of class membership (controlling for the influence of two other covariates (x2 and x3) on the distal outcome. Syntax from the model statement is below:


z on x1 x2 x3;

z on x1 (b1) ;
z on x2 ;
z on x3 ;

z on x1 (b2) ;
z on x2 ;
z on x3 ;

z on x1 (b3) ;
z on x2 ;
z on x3 ;

0 = b1 - b2 ;
0 = b1 - b3 ;

I have two questions:

1) is the model test statement (omnibus wald test) appropriate here for testing whether the correlation between Z and x1 differs in at least one class?

2) I have missing data on my distal outcome (z) variable. So, when I run the distal outcome analysis I have fewer cases included in the full structural model than in my step 1 measurement model. Is there a way to use FIML on my distal outcome variable so that I can use all cases in the step 3 structural model? As it is, it seems to be listwise deleting cases that are missing my distal outcome.

Thank you very much for your time,
 Bengt O. Muthen posted on Friday, August 11, 2017 - 12:48 pm
Q1: Yes.

Q2: No.
 Eric Finegood posted on Friday, August 11, 2017 - 7:14 pm
Thank you for your response!
 Lisa M. Yarnell posted on Tuesday, December 05, 2017 - 6:10 am
Hello, I have a 2-class LCA solution estimated in a full sample, but I now want to run the model separately for two groups (boys and girls). Can I run this as a knownclass model, and model the 2-class solution in each group, and constrain parameters across the groups (then compare that against a model with the constraints released)?

From the posts above, it seems that these constraints across groups/known classes are not possible in LCA? Should I instead just run the 2-class LCA model separately in each group?

I'd like to put both groups in the same model if possible, but if it's needed to run the model separately in each group, I can see if the overall sample solution holds for each group using MODEL CONSTRAINT. Can you clarify if the former is possible?

Thank you.
 Lisa M. Yarnell posted on Tuesday, December 05, 2017 - 6:17 am
P.S. It might be MODEL TEST (not MODEL CONSTRAINT) if I choose the latter option. However, I am interested if I can do this in 1 model.
 Bengt O. Muthen posted on Tuesday, December 05, 2017 - 12:24 pm
Yes, this can be done in one model/run. See UG ex 7.21.
 Eric Finegood posted on Wednesday, January 03, 2018 - 10:30 am
Dear Drs. Muthen, This is in response to my post above dated August 11, 2017. I have run a BCH distal outcomes analysis using the following syntax to test the extent to which the correlation between Z and x1 varies as a function of class:
z on x1 x2 x3;

z on x1 (b1) ;
z on x2 ;
z on x3 ;

z on x1 (b2) ;
z on x2 ;
z on x3 ;

z on x1 (b3) ;
z on x2 ;
z on x3 ;

0 = b1 - b2 ;
0 = b1 - b3 ;

The Wald test I obtained was statistically significant, indicating that at least one correlation was different from the others. I'm wondering if you could please provide an example of how to check pairwise between the three classes to see where the differences actually are? I can, of course, see visually which correlation looks the largest, but I would like to test the pairwise comparisons if possible. I have not been able to find an example of this in any previous posts.

Thank you very much for your time.
 Bengt O. Muthen posted on Wednesday, January 03, 2018 - 2:33 pm
You have to do 3 runs, each with one test.
 Eric Finegood posted on Thursday, January 04, 2018 - 4:59 am
Thank you!
 Eric Finegood posted on Sunday, March 25, 2018 - 1:45 pm
Dear Drs. Muthén, I am using the BCH method to conduct an LCA distal outcomes analysis. I have enumerated a 4-class model and would like to test whether there are between-class differences in the interaction between X1 and X2 as it predicts a distal outcome, z. It is not clear to me whether labeling the parameters of the interactions (b1, b2, b3, b4) and then conducting an omnibus wald (Model test) test is the correct approach here. Would running a wald test here be correct or should I just be visually looking at the direction and statistical significance of the individual class-specific interaction terms? I would greatly appreciate any advise you can give. Syntax from the model statement is below. Thank you in advance.

center x1 (grandmean);
center x2 (grandmean);
x1x2 = x1 * x2 ;
estimator = mlr;
type = mixture;
starts = 0;

z on x1 x2 x1x2 ;

z on x1
z on x2
z on x1x2 (b1);

z on x1
z on x2
z on x1x2 (b2);

z on x1
z on x2
z on x1x2 (b3);

z on x1
z on x2
z on x1x2 (b4);

0 = b1 - b2 ;
0 = b1 - b3 ;
0 = b1 - b4 ;
 Bengt O. Muthen posted on Monday, March 26, 2018 - 8:53 am
This looks ok. Make sure you get only 4 z on x1x2 estimates (not also the Overall which would make them non-identified).
 Eric Finegood posted on Monday, March 26, 2018 - 10:01 am
Hi Dr. Muthén, Thank you so much for your helpful response. I should’ve also mentioned that in the overall model statement and in each of the class-specific statements I allowed the intercept of the distal outcome, z, and the variance of z (i.e. [z] ) to vary.

In your comment, are you suggesting that I remove the x1x2 term from the OVERALL statement? When I do this, the model does not run. As is, when the x1x2 term is included in the overall statement, I get four "z on x1x2" statements in my output. each one of those is a class-specific estimate.

Thank you for clarifying.
 Bengt O. Muthen posted on Monday, March 26, 2018 - 10:17 am
If you get a coefficient for each of the 4 classes, things are ok.
 Eric Finegood posted on Monday, March 26, 2018 - 11:05 am
Thanks !
 Eric Finegood posted on Tuesday, March 27, 2018 - 11:58 am
Hi Drs Muthén,
In the context of a distal outcomes mixture model in which we are trying to test between-class differences in average levels of a continuous distal outcome (i.e. z), it is my understanding that the Wald test of association is conceptually similar to an f-test in a one-way ANOVA. When I conduct a wald test to test between-class differences in my distal outcome, I observe that the p-value for the wald test is much greater than 0.05, suggesting no between-class differences in my distal outcome. When I conduct a classify-analyze approach (i.e. bringing individual’s modal class assignments into spss and conducting a one-way ANOVA f-test to test between-class differences in my outcome), I observe a significant f-statistic indicating at least one between-class difference in my distal outcome. I know that the classify-analyze approach is not appropriate because it does not account for classification error. I only did this so that I could look at the individual data points visually by their modal class assignment.

I’m wondering if you can help me understand why I observed such a discrepancy between results from the Wald test in mplus and results from the ANOVA f-test in SPSS. In each case, the class-specific means of my distal outcome are almost the same, although the SEs may be a little larger in the model-based mplus example. In Mplus, I’m using the mlr estimator. Any guidance you could provide would be very helpful. Thank you very much.
 Bengt O. Muthen posted on Tuesday, March 27, 2018 - 12:06 pm
I assume that you don't have any covariates in your model for the distal. Do you have the same sample size in your two approaches? What is your entropy here?

We may have to see your Mplus output and a pdf of your SPSS analysis.
 Eric Finegood posted on Tuesday, March 27, 2018 - 12:49 pm
Hi Dr. Muthen, thank you as always for the quick reply.

No covariates, same sample size, entropy is 0.83.
 Jasmin Llamas posted on Friday, May 03, 2019 - 2:37 pm
Was this question every answered? I have a similar issue where my Wald is not significant and when exported into SPSS the ANOVA is significant. I know this could be a case by case issue, but just wondering if there is a potential overlapping explanation.
 Bengt O. Muthen posted on Friday, May 03, 2019 - 3:45 pm
I don't recall that we received the files needed. One reason for the difference can be the use of the observed Most Likely Class versus allowing the uncertainty of the latent classification that the mixture approach offers.
 Jasmin Llamas posted on Saturday, May 04, 2019 - 12:14 am
Thank you (I was able to figure my issue out so no longer a problem).

I have a new issue similar to Chee Wee Koh's earlier post that I want to make sure I am understanding correctly. I am looking at gender invariance in my LPA.

Here were my steps:
1. Ran LPA on males and females separately and both indicated the same 3 profile solution.
2. I ran the LPA combining both males and females
3. Allowing item mens to vary across groups I used a Wald test to examine differences in item means. The Wald was not significant. When I look at the data more closely it seems that females are more represented in one of the profiles.

If I understood what was mentioned above, it is ok to combine the data and use gender as a covariate (since gender doesn't effect profile indicators)? Or do all subsequent analyses have to use the separate profiles developed in Step 1?
 Eric Finegood posted on Saturday, May 04, 2019 - 6:24 am
I actually had made an error in my syntax. In my class-specific statements (e.g., %C#1%), I had accidentally labeled (e.g., m1, m2, m3) the class-specific variances (e.g., X) instead of the class-specific means (e.g., [X]), so the wald test (e.g., MODEL TEST: 0=m1 - m2, etc.) was comparing the class-specific variances instead of the class-specific means. So, this is where the discrepancy with my SPSS results came about (because an anova f-test tests whether differences in the class-specific means are greater than 0).
 Bengt O. Muthen posted on Saturday, May 04, 2019 - 11:51 am
Answer to Llamas:

Yes, when measurement invariance has not been rejected, it is ok to combine the data and regress c on gender.
 Trevor Peckham posted on Friday, October 18, 2019 - 4:56 pm

I am a Mplus neophyte trying to use multigroup LCA to examine MI of a 6-class model across 5 waves of a cross-sectional survey (i.e., not panel data). Wave is thus my knownclass. I had previously been pooling the data across the waves, but want to examine whether the latent structure is equivalent over time. I've run 4 models so far:

1. Heterogenity model (class probs and response probs vary across wave)
2. Structural homogeneity (class probs vary; response probs constrained to be equal)
3. Homogeneity model (class probs and response probs are equal across wave)
4. Single class LCA (i.e., without using knownclass or accounting for wave)

I'm trying to understand output of #s 3 and 4. The analyses produce the exact same model in terms of parameter estimates, as I would expect, but have fairly different fit statistics, entropy, dfs, etc. Some output:

Model #3
LL: -57000
Parameters: 123
Entropy: .76

Model #4
LL: -47000
Parameters: 119
Entropy: .55

Can you help me understand what is going on here?

 Bengt O. Muthen posted on Saturday, October 19, 2019 - 12:14 pm
Your model 3 probably has 4 parameters for the Knownclass variable - check the output - which would account for the 4 extra parameters. So that's ignorable. That part of the model also has an LL contribution. Also ignorable.
 Ads posted on Thursday, May 14, 2020 - 6:33 am
I am doing an LCA that focuses on heterogeneity in clinical patients. I also have some matched control participants.

If I use KNOWNCLASS to differentiate patients from controls during LCA estimation, Mplus by default subdivides the groups in more ways than we are seeking. For example, if I am looking to have 3 classes of clinical patients + 1 control group, the default instead makes for 6 total classes (3 clinical + 3 control).

Is there a way to use KNOWNCLASS so that there will be only 1 class of controls but multiple classes of clinical patients?

Related to this, is there a suggested way to increase power/sample size by incorporating information about controls into model estimation of variance or other parameters (without letting information from controls affect class formation for the clinical patients)?
 Bengt O. Muthen posted on Thursday, May 14, 2020 - 3:48 pm
Q1: Yes, there is a general approach where you can let some classes - in this case those that you don't want to have an influence for controls - have equality in relevant parameters over the classes. When you don't let a parameter (such as an outcome mean) vary over classes, you don't define the classes by that parameter.

Q2: That would be using priors in a Bayesian mixture analysis.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message