Comparisons between classes PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 Bonnie J. Taylor posted on Monday, January 24, 2000 - 8:16 am
Suppose I have 4 latent classes. Whether I'm looking at means (in the SEM part of the model) varying over the different latent classes or the probability of a binary outcome varying over the different latent classes, how do I determine if these means or probabilities are significantly different from one another?
 Bengt O. Muthen posted on Monday, January 24, 2000 - 10:29 am
You can use likelihood-ratio chi-square difference testing to compare two mixture models that have the same number of classes and are nested. So, in your case, you also run the model where these means/probabilities (logits) are held equal, and then compute the chi-square difference and degree of freedom difference. In mixture models, a chi-square test is not provided and therefore the chi-square difference test is computed as 2*d where d is the difference between the loglikelihood values from the two models being compared. The degrees of freedom difference is computed as the difference in the number of parameters.
 Anonymous posted on Wednesday, March 07, 2001 - 4:43 am
I’m just beginning to explore mixture models, so please forgive the naiveté. I have five continuous standardized outcome variables and want to assess how many groups they might represent.
I have two questions.
First, is the syntax below appropriate or have I missed something?
Second, what are reasonable criteria for choosing 1 versus 2 versus 3 or more groups?
TITLE: Continuous outcome mixture model

FILE IS "D:\eva\heterogeneity\analysis\nsf.csv";

NAMES ARE fem black bw ed alt self trad open nep cons wts active;
USEVARIABLES ARE alt self trad open nep;
CLASSES = envgrp(2);

LOGHIGH = +15;
LOGLOW = -15;
LOGCRITERION = 0.0000001;
CONVERGENCE = 0.000001;
MCONVERGENCE = 0.000001;


alt WITH nep self trad open;
self WITH nep trad open;
trad WITH nep open;
open WITH nep ;

 Bengt O. Muthen posted on Wednesday, March 07, 2001 - 9:38 am
This setup looks correct. The model is a classic mixture (cluster) analysis, in line with the web site examples given under Classic Mixtures with reference to Everitt's book. Your model assumes a class-invariant covariance matrix with class-varying means. As Everitt explains, this model sometimes gives convergence problems. An alternative is LPA where the variables are uncorrelated within classes. Many authors suggest that the model with the lowest BIC value be chosen.
 Charles Fleming posted on Friday, March 09, 2001 - 1:54 pm
I am examining how reading scores in elementary school predict problem behavior in middle school. I have been running a growth mixture model on reading scores (readgr3 - readgr6) that includes two covariates (lowinc and dstsex). I come up with a two group model that allows psi and theta to vary between groups. Now I want to compare those groups on a latent variable (problem) measured by four indicators (druggr7 school viol notviol), controlling for the covariates (lowinc and dstsex). I am having trouble figuring out the correct syntax for doing this. Here is what I have so far:

TITLE: growth mixture model with 2 groups, covariates
and latent problem behavior construct;
DATA: FILE IS c:\data\analysis\traj2\mixture.dat;

VARIABLE: NAMES ARE lowinc dstsex READGR3-readgr6
prcongr3-prcongr6 druggr7 school viol notviol;
USEVARIABLES ARE lowinc dstsex READGR3-readgr6 druggr7
school viol notviol;
CLASSES = c(2);

TYPE = mixture; miterations = 400;

problem by druggr7*.7 school*.7 viol@1 notviol*.7;
C#1 on lowinc dstsex;
i BY READGR3-readgr6@1;
s BY readgr3@-3 readgr4@-2 readgr5@-1 readgr6@0;
[READGR3-readgr6@0 i*210 s*6 problem*0];
readgr3*38 readgr4*27 readgr5*21 readgr6*15;
i*75 s*.3;
i with s*-.36;

[i*220 s*4];
readgr3*32 readgr4*22 readgr5*18 readgr6*12;
i*70 s*.3;
i with s*-.3;

[i*200 s*7];
readgr3*40 readgr4*30 readgr5*24 readgr6*20;
i*80 s*.4;
i with s*-.4;

OUTPUT: tech1 tech7 tech8;

So my question is: What do I do to the above syntax in order to estimate the relationship between latent group class and the latent problem behavior construct, controlling for the two covariates?
 Bengt O. Muthen posted on Monday, March 12, 2001 - 8:30 am
You want latent trajectory class membership to predict a dependent factor "problem", controlling for covariates lowinc and dstsex. This is done by stating

problem on lowinc dstsex;

in the Overall part of the input, followed by estimation of class-specific means of the problem factor using


in the class-specific parts of the input.

This setting is then analogous to ANCOVA, where the dependent variable is the factor and class membership corresponds to experimental conditions. You have slopes on covariates that are invariant across experimental conditions and can then interpret the intercept differences as being due to experimental conditions. Invariance of the slopes can be relaxed by making class-specific statements.
 Anonymous posted on Tuesday, May 29, 2001 - 1:50 pm
I am running a two class mixture model, I want to be sure I have the correct thresholds for seven variables with only two response options, what do you think??


[imp$1 * 3 exp$1 * 2 lee$1 * 1 INC$1 * -1
OMI$1 * - 2 TOR$1 * -3 OOP$1 * -4]

[imp$1 * 2 exp$1 * 1 lee$1 * 0 INC$1 * -2
OMI$1 * -3 TOR$1 * -4 OOP$1 * -5]
 Linda K. Muthen posted on Wednesday, May 30, 2001 - 9:42 am
I don't see any problems with your starting values. They should reflect the probabilities of the latent class indicators. Following are some general guidelines for selecting starting values:

Selecting Starting Values
· Class probabilities - not required
· Conditional item probabilities - required for thresholds in logit scale
· Translation of probabilities to logits for thresholds - the higher the threshold, the lower the probability
· Very low probability (+3)
· Low probability (+1)
· High probability (-1)
· Very high probability ( -3)
· Let the most prevalent class be last
· For low-probability behaviors, assign the highest logit threshold values to the items for the last class
· For high-probability behaviors, assign the lowest logit thresholds values to the items for the last class
 Patrick Malone posted on Thursday, October 11, 2001 - 8:15 am
I'm trying to run a two-group latent class analysis -- that is, I have two groups of observations and I want to run a latent class analysis on both of them simultaneously, but where each latent class is restricted to one of the manifest groups. I've tried doing this like so:

[fight1$1*-3 . . . disbrk5$1*-5];
[fight1$1*-3 . . . disbrk5$1*-5];

But I'm getting interesting error messages (model estimation did not terminate normally due to a NPD Fisher information matrix; model estimation has reached a saddle point, not a proper solution). Is this because of how I've set up the INTERV variable? Or is it likely just bad starting values in the rest of the model (I know the starting values are OK for one of the two groups, and I just duplicated them for the second set of latent classes)? Is there a better way to do what I'm trying to do?

 bmuthen posted on Thursday, October 11, 2001 - 10:27 am
I assume that the interv variable is the latent class indicator that determines the observed group membership; then the interv statements are correct. Alternatively, training data can be used to handle the two groups. But, you should use different starting values in the two classes for your fight-disbrk latent class indicators. Note also that your model assumes uncorrelated indicators for each of the two groups, which may be a strong assumption. You can generalize your setup to say 2 classes for each observed group.
 Patrick Malone posted on Friday, October 12, 2001 - 8:34 am
Yes, interv is the dichotomous indicator of the observed groups; I should have specified. I shifted the starting values for the second set of latent classes (there are actually six classes per group, with 21 indicators -- this is a big problem), but still get the same errors. I tried specifying the groups with training data instead, but it made no difference. I note that in the partial output I get, many of the threshold estimates have an absolute value over 15, but I'm using the default settings for that, so I'm not sure what's going on. Should I take this to the tech support line? Or just keep fiddling with starting values?

 bmuthen posted on Friday, October 12, 2001 - 5:59 pm
It is ok that you get the message about several of the threshold estimates having values greater than 15. This is actually an advantage in defining the classes because it means that these indicators are either on or off for a given class with probability one. But, the large values can cause a singular information matrix as you indicate and a good approach is to go back and rerun the model with these values fixed at +-15 (or 10) in the input (this does not change anything else in the model) - this may well solve your problem.
 Krishna Rao posted on Sunday, October 28, 2001 - 1:48 pm
I am trying to estimate an unrestricted LCA model. I assueme therefore, that I do not need to specify any treshold values or class probabilites. However, the results I get do not make sense (the conditional probabilites of class membership ar all 0.5) and are different when I use other LCA programs such as LEM. I feel that my syntax is incorrect (given below) - I would be grateful for your help.

TITLE: rahhhlem

FILE IS "d:\temp\krishna\rajhhlem2.dat";

NAMES ARE hv206 hv207 hv208 hv209 hv210 hv211 hv212 hv221
sh36 sh38a sh38b sh38c sh38d sh38e sh38f sh38g sh38h sh38i sh42 sh43
sh46 sh47a sh47b sh47c sh47d sh47e sh47f sh47g sh47j sh47q sh47r
sh47s sh47t hv2011 hv2012 hv2013 hv2014 hv2015 hv2016 hv2017 hv2018
hv20113 hv20114 hv20115 surwat hv2051 hv2052 hv2053 hv2054 hv2055
hv2056 hv2057 hv2058 sh341 sh342 sh343 sh344 sh345 sh491 sh492 sh493;

CATEGORICAL ARE hv206-sh493;

 bmuthen posted on Sunday, October 28, 2001 - 5:13 pm
The Mplus philosophy is that the user should give starting values for the item probabilities in each class (in logit scale for the thresholds) to avoid local solutions that are not in line with theory. See Example 25.9 in the User's Guide to find input specifications for LCA. By not mentioning anything class specific in your input, I think your setup generates zero logit values (0.5 probability) for all classes, which is not what you want.
 krishna rao posted on Sunday, October 28, 2001 - 6:07 pm
Thanks for the reply. Unfortunatly, here at Johns Hopkins where I am a student, we do not have the MPlus User's guide, even though the program is available for students to use. But I shall see what I can do with the information you have kindly provided.
 Linda K. Muthen posted on Tuesday, October 30, 2001 - 9:34 am
Following is the input for Example 25.9. I hope this helps. The statements in brackets are giving starting values for the thresholds.

TITLE: this is an example of a latent class analysis with binary latent class indicators
DATA: FILE IS lca.dat;
CLASSES = c(2);

[u1$1*2 u2$1*1 u3$1*0 u4$1*-1];

[u1$1*1 u2$1*0 u3$1*-1 u4$1*-2];
 krishna rao posted on Tuesday, October 30, 2001 - 10:45 am
Dear Linda,
Thanks so much for this - it really helped me out. I have one more problem to bother you with - how do I get the MPlus to give me output where each observation (i.e. individual) is assigned to one of the classes. Thanks in advance...

 Linda K. Muthen posted on Tuesday, October 30, 2001 - 1:17 pm
I think you mean that you want to save the posterior probabilities for each person and also save the highest probability class for each person. To do so, say


You can print a summary of the Mplus commands from under DEMO.
 dagmar posted on Tuesday, December 11, 2001 - 2:05 pm
Hello, I would like to find out the estimated intercept and slope for each individual in the sample so that I could look at the range of the intercepts and slopes for individuals in each class. Can Mplus produce these? How do I ask for them? Thanks very much for your help. Dagmar
 bmuthen posted on Tuesday, December 11, 2001 - 5:05 pm
It sounds like you are doing a growth mixture analysis. Assuming that this is a correct impression, the intercept and slope for each individual can be estimated as factor scores. However, with growth mixture models individuals' factor scores are not printed for each latent class. Instead, you have two other outputs. One, you get the factor scores for the individual's most likely class. Two, you get the "mixed" factor scores, that is the weighted sum over the latent classes. This is further described in the technical appendix of the User's Guide.
 Dagmar posted on Wednesday, December 12, 2001 - 9:58 am
I am doing a growth mixture analysis. My users's guide has numerous appendices, but none that I looked at (latent variable mixture modeling, estimation of factor scores) appears to contain syntax for requesting factor scores for the individual's most likely class and the "mixed" factor scores. The following is syntax for one of the models I'm running. Perhaps you could suggest how to modify this syntax so that it would produce the output files? Thanks! Dagmar

CLASSES = c(3);

LOGHIGH = +15;
LOGLOW = -15;
LOGCRITERION = 0.0000001;
CONVERGENCE = 0.000001;
MCONVERGENCE = 0.000001;


FILE IS allrawdata3grrvar;
FILE (RESULTS) IS allresults3gr;
FILE (TECH3) IS alltech33gr;
FILE (TECH4) IS alltech43gr;

i BY Y1-Y7@1;
s BY Y1@0 Y2@1 Y3@2 Y4@3 Y5@4 Y6@5 Y7@6;
[Y1-Y7@0 i s];
i WITH s@0;

[i*.4 s*.1];

[i*.5 s*.2];

[i*.1 s*0];
 Linda K. Muthen posted on Wednesday, December 12, 2001 - 10:04 am
The statistics are described in the technical appendix. Syntax is described in the regular chapers. If you add SAVE = FSCORES; to your SAVEDATA command, you will get factor scores in the file, allrawdata3grrvar. This is described on page 93 of the Mplus User's Guide.
 Anonymous posted on Friday, February 01, 2002 - 8:59 am
Assistance interpreting Output, for c#1 on x1 . . .xn, ect.

Hello, I have simplified my output to two classes and one exogenous predictor.

Suppose I have an identified two class model, With a single exogenous predictor of class. My output provides me with this information:

1. An intercept of C1: for example -5
2. An effect of X1 on C1: for example +7

In your own words, would you mind explaining
1. The interpretation of each of these statistics.
2. The equation for converting them into odds ratios.
3. The equation for converting them into probabilities of class membership.

I apologize if this seems overly simplistic, but it is easy to get confused. Many thanks.
 bmuthen posted on Saturday, February 02, 2002 - 8:36 am
The coefficients for c on x are in logit scale, where

Prob = 1/(1+exp(-L))

with exp being e-to-the-power-of and L denoting the logit. Take as an example a case with binary class variable and a single x, giving the equation

logit = intercept + slope*x.

To interpret the intercept you can consider x=0, so that the logit = intercept. Compute the probability. This is the probability of being in class 1 at x=0. To interpret the slope, you can either consider the influence of x on the (1) probability (via the logit), or (2) the odds, or (3) the odds ratio. For (1), simply insert different x values in the logit equation and compute the probabilities. For (2), the odds of being in class 1 versus class 2 is simply P/(1-P). For (3), the odds ratio is the odds at two different x values. Typically this is computed with a binary x. There is a simple shortcut to get this odds ratio,

odds ratio = exp (slope)

So, to interpret slopes one can use phrases like "comparing class 1 to class 2, the odds ratio for male versus female is...". Or, with a continuous x, "the odds of being in class 1 versus class 2 is significantly increased with increasing x value".
 David B. Rein posted on Sunday, February 24, 2002 - 10:09 am
In creating models with increasing numbers of classes, I run into the experience that a certain number of classes are not possible to estimate without a great degree of effort.

In this case, I have proceeded through four classes, with each class, better than the previous, by smaller degrees (based on BIC stat).

Ideally, I would compute a 5 class model with a smaller BIC than my four class model, reject the 5 class model and move on. Functionally, I get a terrific model with four classes, and am not able to estimate a five class model.

Is there anything interesting that I can say about my data creating a situation that does not allow further estimation of classes. Mclachlan, states that ultmiately, the data may be modelled until c=n, the number of observations. In lieu of that - what information is gained from a model when an estimating "wall" is encountered?
 Linda K. Muthen posted on Monday, February 25, 2002 - 9:52 am
In working with simulated data, we have seen that non-convergence is often an indication that the correct number of classes has been exceeded. Also, a drop in BIC alone should not be used as the only criteria to select the number of classes. Other criteria such as interpretability of the classes and class size should be considered.
 David Rein posted on Thursday, April 11, 2002 - 12:43 pm
I just want to confirm that in mixture modelling the number of possible classes islimited by sample size and not the number of observed indicators. McLachlan and Peel seem to state this, but I want to double check.

For example, in a mixture model, I can have a latent variable with 5,6,7,8, etc . . values, even if I just have four observed indicators. My theoretical limit of values for my class value would be where c=n.

Is this correct?
 bmuthen posted on Thursday, April 11, 2002 - 6:33 pm
It depends on the model. For example, with a latent class analysis with four binary outcomes already a 3-class model has one indeterminacy as shown in Goodman (1974; see our Reference section). There seem to be few general rules, however.
 David Rein posted on Friday, April 12, 2002 - 7:05 am
Thank you,

How are estimates from a simple FMM such as:

"Normal mixtures, Everitt & Hand (1981), Fisher's Iris data, UNequal covariance matrices"

used to forecast class membership in unfitted data?
 bmuthen posted on Friday, April 12, 2002 - 8:31 am
Interesting question. You do an Mplus run where you use the estimates to fix every parameter of the model, and enter as your sample the group of unfitted subjects. While no parameter gets estimates, out comes the estimated posterior probabilities for each subject which is what you want.
 Patrick Malone posted on Wednesday, July 31, 2002 - 6:38 am
I've got a situation where I want a latent class to mediate the effect of a continuous predictor on a continuous outcome. I can set up the class membership logits to depend on the predictor -- that part's fine. However, the only way I've seen to model the effects of class membership on the outcome is as Bengt noted on his 3/12/01 message -- by letting the mean of the outcome vary by class.

My problem is that it seems that this would redefine the classes -- that is, they'd be optimized with the outcome variable acting like a latent class indicator. I'd prefer the latent class definition to be based only on the other indicators.

One solution that's come to mind is to run the latent class model with no other variables, and use the estimated values as fixed starting values in the second run that includes the other relations. Is this sensible (I know it will mess up my degrees of freedom)? Is there a better way? Or am I making too much of the reoptimization concern?

 bmuthen posted on Wednesday, July 31, 2002 - 9:35 am
Yes, when you include the outcome, your class membership changes. This may be desirable or not.

On the one hand, one can argue that this is at it should be because all observed variables are informative about class membership. If the model is correct, adding the outcome should not change membership very much, but merely improve the classification (better entropy).

On the other hand, you may want to enforce the class membership that you have before entering the outcome. This means that you must hold individual class probabilities fixed, not only the estimated parameters. If only estimated parameters are held fixed, the outcome still changes the class probabilities and the individual estimated posterior probabilities. I think a solution is to use the training data facility in its probability version (TTYPE=PROBABILITIES), using the individual estimated posterior probabilities from the run without the outcome. Note, however, that this strongly underestimates s.e.'s because you are acting as if class membership is known.
 Patrick Malone posted on Wednesday, July 31, 2002 - 9:47 am
Thanks. It's not so much class membership that I'm worried about changing as class definition (this may be because I haven't thought about it enough yet). I want to be able to say that X increases the likelihood of membership in class C, which in turn is associated with higher levels of outcome Y, but not have the definition of class C depend on what Y is in the model. Does that make sense?
 bmuthen posted on Thursday, August 01, 2002 - 9:13 am
Yes, I understand what you are saying here. I would not worry about this. Think about the analogy with SEM and all continuous variables, so that c is a continuous factor, say eta. Say that you even have a set of indicators z of eta in addition to the predictors x. If you add the outcome y to the model, to some extent this changes all parameters (including the loadings for z's on eta) - and if you were to estimate the factor scores for eta, they would also be different as compared to not having y in the model. Now, one could argue that to have the measurement of eta settled before adding y, one should fix the loadings of the z's on eta. One can do that, but I don't think it is customarily done in SEM.
 Patrick Malone posted on Thursday, August 01, 2002 - 9:21 am
Thanks. That helps.
 Jeannie posted on Sunday, October 27, 2002 - 1:07 pm

how exactly are the standard errors for pihats calculated? Do you use the observed second derivative matrices or bootstrap estimates, or something else?

 Linda K. Muthen posted on Sunday, October 27, 2002 - 2:21 pm
Can you please tell me what you mean by pihats?
 Jeannie posted on Monday, October 28, 2002 - 7:28 am
Oops, sorry, I meant estimates for conditional probabilities.
 bmuthen posted on Monday, October 28, 2002 - 7:56 am
- do you mean the conditional probabilities for items given class in LCA, or the conditional probabilities (posterior probabilities) for each individual?
 Jeannie posted on Monday, October 28, 2002 - 9:03 am
so sorry for the back and forth, I mean conditional probability for the item given the class in LCA.

 bmuthen posted on Monday, October 28, 2002 - 9:37 am
- No problem. The answer is given on Mplus Discussion under the heading

LCA s.e.'s for estimates in probability scale 10/16/2002 05:59am
 jeannie posted on Monday, October 28, 2002 - 12:07 pm

I see that I'm still not asking my question properly, I apologize! OK, I understand the method for determining the s.e. for a function of an estimate, what I was asking though, is then, how did you estimate the variance of the logit estimate?
 bmuthen posted on Monday, October 28, 2002 - 4:28 pm
We are converging. For mixtures, the s.e.'s are computed using observed second-order derivatives - and with the default MLR approach, doing the robust (sandwich-type) version that also involves first-order derivatives. See Mplus User's Guide, Technical Appendix 8, page 370.
 Howard posted on Thursday, November 07, 2002 - 2:29 pm

I have a question about applying the advice in Linda Muthen's post from 5/20/2001 regarding starting values in a LCA model with discrete indicators. I have 33 indicators, and am trying to fit a 2 class, then a 3 class solution. The class counts vary based on the starting values, so I am concerned that I am following 'best practice' in this area.

One approach seems to be to set all the starting values in c#1 to 0, then c#2 to -1, and so on. For example:


But the advice above implies that for prevalence indicators, the starting values should increase. So, if the indicators u1-u4 are in order of decreasing prevalence, I should set the starting values as follows:

c1 c2
u1 -3 -4
u2 -1 -2
u3 1 2
u4 3 4

Which approach is recommended? It seems like the first leads to a lower BIC, which would imply that it is preferred.

An alternative that occurs is to use the logit of the the observed mean (proportion) for the starting value. Following this thought, should I use the 2 class probabilities as starting values for the 3 class solution?

Thank you in advance.

Thank you in
 bmuthen posted on Thursday, November 07, 2002 - 5:50 pm
I think you should all these approaches and if they give different log likelihood values you should probably try further variations. LCA is known to have multiple solutions, particularly with smaller samples and models that are less clearly determined by/suited to the data, and you want to make sure that you find the one with the best (highest) log likelihood value.
 Anonymous posted on Monday, January 27, 2003 - 12:11 pm
I'm trying to develop a latent class measurement model in which I allow the residuals for two or more indicator variables to be correlated.

Does Mplus not allow this option ? I've tried:

indic1 with indic2;

as well as:

indic1 with indic2;


 Anonymous posted on Monday, January 27, 2003 - 1:57 pm
I have some additional questions regarding LV mixture modeling in Mplus v2.12:

(1) In constructing a SEM with a latent class measurement model (LCMM) as an outcome variable, I've noticed that Mplus provides: (a) Chi-Square tests of model fit for the LCMM, and (b) Chi-Square tests of model fit for the entire SEM. Are these calculated in stages -- i.e., does Mplus fit the LCMM model first, obtain start values and model fit, and then generate overall model fit statistics ?

(2) Mplus provides two types of class counts / proportions in the output for a SEM with a LCMM: estimates based on posterior probabilities and "estimates based on most likely class membership". Is it the case that the latter series of probabilities based on the LCMM "in isolation" (i.e., no covariates) while the former series is based on LC membership based on the addition of additional information (i.e., covariates) into the model ? If these values don't change much does this imply that the additional of covariates does not change the distribution of individuals into classes (and thus, evidence that the LCMM is "stable") ?

(3) Finally, Mplus provides a space for R-Square statistics for each of the classes in my model but whenever I run a SEM with a LCMM, this portion of the output is blank -- i.e., Mplus isn't producing R-Square values for the latent class membership portion of the model, in spite of providing the information referenced in question (2) above.

(4) Is it possible to construct SEMs where a LCMM is used as an intervening variable between a set of covariates and a categorical outcome variable (dichotomous or polychotomous) ?
 bmuthen posted on Monday, January 27, 2003 - 2:49 pm
Regarding correlating latent class indicators beyond what the classes explain, this can easily be done when the indicators are continuous but not categorical. For continuous, you do it in Overall when they are class-invariant. With categorical indicators you have to use a trick, such as making one of the indicators be an x variable with a direct effect from this new x to the other indicator.
 bmuthen posted on Monday, January 27, 2003 - 3:00 pm
Here are some answers to the 4 questions.

(1) the latent class and the SEM parts of the model are analyzed jointly - i.e. ML estimates are determined simultaneously for the 2 parts.

(2) No on the first question - it does not have anything to do with covariates. The posterior probabilities reflect the "exact" values of a person's class membership - allowing for this individual to be partially a member of all classes. This is the counterpart of "factor scores" for continuous factors. The numbers based on most likely class membership will only approximate numbers based on post probs if the classification is clear, i.e. mostly contains 0 or 1 post prob values.

(3) I think you are referring to the latent class membership regression part of the model ("c on x"). If so, this is multinomial logistic regression for which I don't know that there is a well-established R-square measure (correct me if I am wrong).

(4) You can do "c on x" and then let u be c indicators - that means that the latent class variable c is an intervening variable between x and u. But you cannot have a path model leading to c - that is for Version 3.
 Anonymous posted on Tuesday, January 28, 2003 - 11:47 am
First: regarding your answer to my original question (4): i.e., “…constructing SEMs where a LCMM is used as an intervening variable between a set of covariates and a categorical outcome variable …”. The recommendation you make – is this what is diagrammed on page 262 of the Mplus Users Guide (version 2) ?

Second: regarding your answer to my original question (3): I was referring to the fact that there is a section on the Mplus output titled: “R-Square / Class 1 / Class 2…” ( “/” denotes a line break in the actual Mplus output) that is always left blank when I run LCMMs or SEMs including LCMMs. Is this feature for some reason not operable in v.2.12 ?

As to including direct effects between indicators in a LVMM: I’ve attempted to do as you suggested above. Is this what you had in mind ?:

For dichotomous indicator variables A, B, C, and D, with a T=4 LCMM and including indicator A in the Mplus script as an “X” variable (thus I write A as “AX” below)…

B on AX;

c#1 on AX;
c#2 on AX;
c#3 on AX;

[B$1*-1 C$1*-.5 D$1*-2];

[B$1*1 C$1*.5 D$1*2];

[B$1*-1 C$1*.5 D$1*2];

[B$1*1 C$1*-.5 D$1*-2];

My concerns with this approach are:

(1) The method doesn’t provide loadings for variable AX, but rather the influence of AX on LCMM c# for class c#1 vs. c#4, c#2 vs. c#4, etc. I.e., I don’t see how to recover the LCMM probability loadings from these effects. Furthermore, in running the model above, Mplus doesn’t provide the usual Chi-Sq and L-R Chi-Sq tests / probability loadings output when I specify the above model. Am I specifying the model correctly ?

(2) Related to point (1): if I understand the Mplus language / operationalization correctly, doesn’t the LC variable “cause” indicators A, B, C, and D, in which case shouldn’t the 3rd through 5th lines of code be: “AX on c#1”, etc. ?

(3) When I compare the Mplus results from those obtained from another software application that does explicitly allow the incorporation of direct effects between indicators, the loadings for variable B are off in 2 of the T=4 classes by 20 percentage points. Is this likely due to differences in ML algorithms vs. the way the model is specified between the two packages ?

Thanks again for your feedback and suggestions. I look forward to v3.0.
 bmuthen posted on Tuesday, January 28, 2003 - 12:13 pm
On "First", yes.

On "Second", please send output to Mplus support (

Regarding the "residual correlation", or direct effect, consider first a test. Moving A to be an x should give the same B, C, D estimates as in the original model - here you should not include "B on AX". If I see this correctly, this should work out the same because both models assume conditional independence of the 4 indicators given the latent classes.

Regarding your questions,

(1) - you estimate the same joint probability distribution for c and x, but you have to derive the conditional x | c prob's from it.

(2) it's ok for cond indep reasons given above.

(3) which other software are you using and are you sure you are working with the same number of parameters? the residual correlation has 1 single parameter, right? is the correlation specified as a random effect (cont's latent vble) or in some other way?

You may prefer another Mplus approach. Specify a second latent class variable which influences A and B. How to handle 2 latent class variables and the resulting equality constraints is shown on p. 11 in paper #86 (see Mplus home page).
 Anonymous posted on Tuesday, March 11, 2003 - 7:57 am
is it possible to change the comparison class in a class analysis. For instance, in my LTA of smoking, the low or no smoker trajectory is my comparison class. Is it possible to have one of the other trajectories set as the comparison?
 Linda K. Muthen posted on Tuesday, March 11, 2003 - 8:49 am
Yes, you can do this by adjusting the starting values so that the class you want for the reference class is the last class.
 Nan S. Park posted on Wednesday, April 02, 2003 - 7:53 am
I'm working on latent class and latent profile analyses. I was asked from one of my dissertation committee if there were any measures regarding "reliability" (of class membership) in the mixture modeling procedures. Can someone help me out? Thanks.
 Linda K. Muthen posted on Wednesday, April 02, 2003 - 8:27 am
Classification quality can be assessed by looking at entropy and the classification table that is printed in the output. One can also use auxiliary variable to investigate the interpretability and usefulness of the classes.
 Nan S. Park posted on Wednesday, April 02, 2003 - 12:12 pm
Thanks for your reply. I have an additional question. When different starting values are used, the sizes of class and of entropy fluctuate. I wonder if entropy can be used to evaluate the extent to which classification is true and consistent with the data--something analogous to "reliability" measures in classical test theory.
 Linda K. Muthen posted on Wednesday, April 02, 2003 - 1:10 pm
If you get different classes with different starting values, you should select the solution with the largest loglikelihood value.
 Patrick Malone posted on Thursday, April 03, 2003 - 6:09 am
In my experience, the LCAs do seem to be sensitive to starting values. I've written a SAS job to generate random starting values within a range. My usual procedure is to generate 10 or 20 sets and run MPlus from a batch file. I then look for convergence on the best solution. If I get 10 or 20 solutions, I conclude the model is underidentified. Does this seem like a reasonable approach?
 Linda K. Muthen posted on Thursday, April 03, 2003 - 6:45 am
 Wim Beyers posted on Wednesday, April 16, 2003 - 11:30 pm
Classes with growth functions of a different order, is that possible in Growth Mixture Modeling?
I mean, suppose I have 4 repeated measures of let's say delinquency in a sample of adolescents, and my theory says that I have to distinguish at least three trajectory classes: Class 1 (1st order model of growth) having a low intercept only, so no growth at all (steady low); Class 2 (2nd order growth function) having an intercept and some slope, indicating linear growth in delinquency with age; and finally Class 3 (3rd order growth function) having an intercept, a linear slope and a quadratic component too, indicating accelerated growth at the end of the study (for instance)...
Can I test these hypotheses using Growth Mixture Modeling in Mplus? It's possible in the semi-parametric approach of Nagin (1999).
Thanks for your help.
 bmuthen posted on Thursday, April 17, 2003 - 6:19 am
Yes, this is possible. For example, you specify a quadratic function in %Overall% and then you fix the parts you don't need in a given class. So the low, flat, intercept only class would have the linear and quadratic means and variances fixed at zero (and corresponding covariances).
 Wim Beyers posted on Thursday, April 17, 2003 - 7:43 am
Thanks a lot! I guessed this was the case... I just teached a short session of advanced LGC modeling, including Growth Mixture Modeling and Nagin's approach, and this was one of the seemingly differences between the two (and a question of one of the participants). However, I told the participants that fixing to 0 the parameters you don't want indeed was the trick to do it in Mplus.
 bmuthen posted on Thursday, April 17, 2003 - 8:18 am
There are important differences between Nagin's approach and that in Mplus. For example, Nagin's approach does not allow within-class variation, i.e. random effect variation of growth factors. This advantage of Mplus is important because you may arrive at the wrong classes if you postulate zero within-class variation. Also, Mplus allows this variation to differ across classes which is also critical. If you want me to elaborate on this and other advantages of Mplus over Nagin's approach, please let me know.
 Anonymous posted on Monday, November 10, 2003 - 9:33 am
In a growth mixture model, I looked at continuous predictors of class membership. I computed odds ratios for two different continuous predictors, in separate models. Is there a way to standardize the odds ratios so that I could compare the effects of the two different predictors?
 Linda K. Muthen posted on Monday, November 10, 2003 - 4:31 pm
You could standardize the raw coefficients by multiplying them by their standard deviations andthen compute the odds ratios.
 Anonymous posted on Monday, November 17, 2003 - 11:32 am
I understand the the estimates given in the output for these predictors (C#1 ON PREDICTOR)
are given in terms of estimated value and standard error - do I get standard deviation by using the number of observations as n and the regular formula sd=SEM * square root(n) ???
 bmuthen posted on Monday, November 17, 2003 - 11:45 am
The standard error (SE) is the term commonly used for the standard deviation (SD) of a parameter estimate. Perhaps you are thinking of the relationship between SE and SD for a sample mean,

SE = SD/sqrt(n) ?

That thinking should not be used here. For the parameter estimates SE=SD.
 Frank Lawrence posted on Monday, November 17, 2003 - 1:20 pm
I am running a GGMM Monte Carlo experiment. I want to compare model fit assessments averaged over the repetions. I noticed that tech11 and tech13 output fit information for each repetition. Is there a way to have Mplus summarize tech11? tech13?
 Linda K. Muthen posted on Monday, November 17, 2003 - 2:23 pm
No, not currently.
 Anonymous posted on Tuesday, November 18, 2003 - 11:25 am
Back to standardizing odds ratios:
Sorry to be so dense. I was looking somewhere else to understand standardizing these odds ratios and it seems that what I want is to standardize on the predictor - in that case, can I multiple the coefficient (printed in Mplus output following C#1 on PREDVAR) by the standard deviation of PREDVAR, that I get from SAS? Then I would think the interpretation would be for every one standard deviation change in PREDVAR, the change in class membership is equal to the calculated standardized odds ratio.
 Linda K. Muthen posted on Tuesday, November 18, 2003 - 12:21 pm
The standardization sounds correct. The change in odds of being in a certain class relative to the last class is calculated by the odds using the standardized logit slope calculated by multiplying the raw logit slope by the s.d. of predvar.
 Anonymous posted on Thursday, March 04, 2004 - 8:57 am
Back to the classification table and entropy discussion. Forgive me if this sounds a bit redundant but I wanted to make sure that I was correctly interpreting this discussion. When you refer to “classification table” above, do you mean the portion of the output entitled “Classification of Individuals Based on Their Most Likely Class Membership,” the output entitled “Final Class Counts and Proportions of Total Sample Size,” or both? I understand that the more dissimilar the values are between these two, the poorer classification quality is and that entropy will be lower. If this is correct, is it necessary to interpret the classification table if I interpret entropy? If I do need to interpret the classification table, how much difference is cause for concern? Finally, can you clarify what you mean when you suggest that auxiliary variables be used to determine the interpretability and usefulness of the classes (starting with what they are)? I understand that there may be no hard and fast rules for some of these questions, but your insights would be appreciated.
 bmuthen posted on Thursday, March 04, 2004 - 6:16 pm
The classification table refers to the Classification of Individuals Based On Their Most Likely Class Membership. Entropy is a single number that summarizes the posterior probability infotmation of that table. The table gives additional information about which classes are less distinguishable. How high a posterior probability average in an off-diagonal cell is a cause for concern is subjective. 0.20 seems too high, while 0.05 might be ok. Also, the too high values may refer to classes that are not important to distinguish betweeen. The value of auxiliary variables is discussed in the Muthen 2003 article in Psych Methods given on the Mplus home page.
 Anonymous posted on Tuesday, August 16, 2005 - 8:19 am
my question concerns the loglikelihood H0 value in the Mplus outputfile. I used latent mixture modeling and ran it for several data samples. Shouldn´t the loglikelihood H0 value always increase (smaller negative value) with increasing number of classes? I am asking because in some instances the loglikelihood value is smaller (larger negative value) for a 4 class solution compared to the three class solution. Do you know of any such cases or does this indicate that something is wrong? How would I typically interpret this?
Thanks for your input!
 George Deitz posted on Tuesday, August 16, 2005 - 8:40 am
I have a finite mixture regression model with 2 DVs. For example's sake, let's say that the three class solution is the best. My Model command set up looks something like:

Y1 Y2 = X1 X2 X3 X4 X5 C1 C2
c#1 ON C1 C2;
c#2 ON C1 C2;

Y1 Y2 ON X1 X2 X3 X4 X5;

Y1 Y2 on X1 X2 X3 X4 X5;


If I want to test the difference in the slopes between classes using a chi-square difference test, what should I do? I've tried a couple different approaches that I thought made sense but keep getting different log-likelihood values, so I'm afraid I might be doing something wrong.

I've tried to follow the example in the book, but having the second DV makes it a little confusing. If I split the regression command lines out and do something like:

Y1 ON X1 x2 x3 x4 x5;
Y2 ON X1(1) X2 X3 X4 X5;

Y1 ON X1 X2 X3 X4 X5;
Y2 ON X1(1) X2 X3 X4 X5;

Is this constraining the slopes for y2 on x1 to be equal for classes 2 and 3?

I've also tried removing X1 from the %C#2% regression equation ...

Y1 ON x2 x3 x4 x5;
Y2 on x1 x2 x3 x4 x5;
y1 with y2;

Isn't this model now constraining the Y2->X1 slope for classes 1 and 2 to be equal? Isn't this effectively the same thing?

Are the differences I'm seeing simply a matter of different starting values for each run or am I entering the wrong syntax?
 Linda K. Muthen posted on Tuesday, August 16, 2005 - 10:57 am
In the following construction, I believe that everything after (1) is ignored.

Y1 ON X1 x2 x3 x4 x5;
Y2 ON X1(1) X2 X3 X4 X5;
Y1 ON X1 X2 X3 X4 X5;
Y2 ON X1(1) X2 X3 X4 X5;

It should be:

Y1 ON X1 x2 x3 x4 x5;
Y2 ON X1(1)
X2 X3 X4 X5;
Y1 ON X1 X2 X3 X4 X5;
Y2 ON X1(1)
X2 X3 X4 X5;

If this does not solve your problem, send the outputs and your license number to Include TECH1 in these outputs.
 Thomas Olino posted on Tuesday, October 18, 2005 - 1:01 pm
I am conducting growth curve analyses and the predictors of the slopes and intercepts are all dichotomous variables. I want to examine interactions between the predictors, however, when I attempt to do this using a multigroup analysis, the covariance coverage is poor for one of the groups.

My understanding is that for interactions with one or more dichotomous variables, the preferred approach is to use multigroup analysis. I think that I could use the difference between the -2LL in nested mixture models to examine interactions. Is there a detriment to using this approach?
 bmuthen posted on Tuesday, October 18, 2005 - 6:47 pm
You can simplify your model by using MIMIC - CFA with covariates - instead of multi group modeling. To capture the interaction, just multiply the x variables using DEFINE. In the MIMIC the low coverage doesn't hurt as much.
 Thomas Olino posted on Wednesday, October 19, 2005 - 4:50 am
If I were to use MIMIC-CFA with covariates and include the interaction in the model, how would I base the inclusion of the interaction? Would it be based on the significance of the association between the interaction and the slope and/or intercept? Or, would it be based on improvement of model fit? Although, they should be closely related.
 Linda K. Muthen posted on Wednesday, October 19, 2005 - 9:01 am
I think the most straightforward thing to look at is the z value for the interaction parameter, the ratio of the parameter estimate to its standard error.
 Kat posted on Thursday, January 12, 2006 - 5:07 am
I have carried out LCA with covariates which yielded 4 classes, but now wish to specify staring values so as to change the second class to be the reference class, i.e. the last class which i can then compare the other 3 classes to.
However, I am somewhat unsure how to obtain and specify these starting values from the output. I would really appreciate any advice on this.
 Linda K. Muthen posted on Thursday, January 12, 2006 - 6:57 am
You use the ending values for class 4 of the first run as starting values for class 2 of the second run. Example 7.8 shows an LCA with user-specified starting values.
 Jennifer M. Jester posted on Friday, May 19, 2006 - 11:09 am
I am using a growth mixture model, based on number of attention problems in children at 4 time points. My model is that the latent trajectory class membership predicts a continuous outcome, number of alcohol problems (measured concurrently with the last time point in the growth model). I would also like to test whether the relationship of class membership and alcohol problems is mediated by another variable. Following a March 12, 2001 posting on this website, I thought that this would be the correct syntax :

alcprb on ssrt; in the Overall part of the model
[alcprb]; in each class-specific part of the model.

I tested for significant differences of alcprb intercept by comparing nested models where I allowed [alcprb] to be different in each class and a model in which they are forced equal in each class.

Does this seem right?

 Bengt O. Muthen posted on Friday, May 19, 2006 - 3:24 pm
 Jennifer M. Jester posted on Wednesday, May 24, 2006 - 10:35 am
Checking my understanding of the mediation modeling. If I have [alcprb] in each class and run the model without "alcprb on ssrt", I look at the means of alcprb in each class and test for differences to test the direct effect of class membership on alcprb.
When I put "alcprb on SSRT" in OVERALL, and [alcprb] in each class, now I have intercepts instead of means and if there are still significant differences between classes, that tells me that there isn't full mediation of the effect of class membership on alcprb by SSRT. Is there a way to test for partial mediation? I would like to compare alcprb in each class in the two models, but in one case this is a mean and in the other case it is an intercept, so I'm not sure how to compare these.
Is there an example of using this in a paper?


 Bengt O. Muthen posted on Thursday, May 25, 2006 - 9:55 am
In the model with [alcprb] in each class and alcprob on ssrt, the [alcprb] part is a "direct effect" from the latent class variable c to the distal (as you say it is also an "intercept", not a mean). I assume here that c also influences ssrt. I see full mediation as taking place if there is no direct effect and that is tested by class-invariance of [alcprob] in the model I stated. So that test seems like all you need.

There are also possibilites in this model to also test for class-invariance means, expressing these as functions of model parameters, using Model constraint.
 Irene Biza posted on Friday, June 02, 2006 - 3:45 am
I am a PhD student and my research is in Education of Mathematics. I have a questionnaire of about 25 items labeled with 0 and 1 (as false and correct) that has been administered to 182 students. I made the following LCA with this data:
classes = c(4);
categorical= q32 q33 q34 q35 q36 q37 q38 q39 q310 q311 q312 q313 q314
q44 q45 q47 q48 q49 q410 q411 q412 q413 q415 q52 q53;
TYPE IS mixture;

The less BIC is at 3 classes, the less AIC, ABIC are at 8 classes and the less Lo-Mendell-Rubin Adjusted LRT Test p-value (= 0.0002) is at 3 classes (versus 2) and the biggest is at 8-classes (versus 7) is 0.83 (what is the acceptable value for this p-value?).
The problem is that it is not so clear to me what indices I could trust to test if my model fits and of course if a can do LCA with this sample size and this number of items.
I also did a confirmatory LCA with very well (according my theory) classification, but also I do not know how to support the fact that this model is better than others:
USEVARIABLES ARE q32 q33 q36 q310 q311 q38 q314;
classes = c2C(4) c4(2);
categorical= q32 q33 q36 q310 q311 q38 q314;
c4#1 with c2C#1;
[q33$1*-15 q32$1*-15 q314$1*-3];
[q33$1*-15 q32$1*-15 q314$1];
[q33$1*15 q32$*1.2 q38$*2 q311$1];
[q33$1*15 q32$*1.2 q38$*-3 q311$1];
[q310$1 q36$1];
[q310$1 q36$1];
TYPE IS mixture;
Please if you think that this kind of analysis it is not adequate for my data I am open to any other ideas concerning data analysis.
Thank you in advance!
 Linda K. Muthen posted on Friday, June 02, 2006 - 8:52 am
The issue with a small sample size is power. You should not have computational problems in your situation.

The p-value to look for with Lo-Mendell-Rubin is a value greater than .05.

See the following paper for suggestions about how to decide on the number of classes:

Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

You can download this paper from the website. See Recent Papers.
 Daniel Rodriguez posted on Monday, June 05, 2006 - 7:24 am
For Tech 14, I got the message:


How do I do this? I already have 100 10 starts.
 Irene Biza posted on Monday, June 05, 2006 - 9:24 am
Thank you for the help but I really didn’t manage to understand which the best classification in my analysis is.
I will be more exact regarding my results to help you to help me!
My LCA (25 items and 182 individuals) gives the following results:
2 classes: AIC=4076, BIC=4239, ABIC=4078
3 classes: AIC=3763, BIC=4010, ABIC=3766, LRT Test p-value=0,0002
4 classes: AIC=3683, BIC=4013, ABIC=3687, LRT Test p-value=0,1640
5 classes: AIC=3601, BIC=4014, ABIC=3606, LRT Test p-value=0,0648
6 classes: AIC=3572, BIC=4069, ABIC=3578, LRT Test p-value=0,2585
7 classes: AIC=3549, BIC=4129, ABIC=3556, LRT Test p-value=0,30
8 classes: AIC=3545, BIC=4209, ABIC=3553, LRT Test p-value=0,83
9 classes: AIC=3589, BIC=4335, ABIC=3597, LRT Test p-value=0,39
The classification quality (entropy>0.97) is very good in all cases. According to the indices and the LRT Test p-value which do you think is the best classification?
 Linda K. Muthen posted on Monday, June 05, 2006 - 9:25 am
We have added some guidelines for using TECH14 and some new options to use with TECH14. These are described in the Version 4.1 Mplus User's Guide which is on the website. Check the index for TECH14 to find the guidelines and see also the LRTSTARTS and LRTBOOTSTRAP options.
 Linda K. Muthen posted on Monday, June 05, 2006 - 11:07 am
For Irene: You need to make the decision on the number of classes yourself. You should also consider which classes make sense theoretically and substantively.
 Irene Biza posted on Tuesday, June 06, 2006 - 2:28 am
That means that any number of classes is acceptable if it makes sense theoretically?
For example, the 3 classes are great for me but the LRT Test p-value is too low. Also the 4 classes are interpretable but I have neither the less AIC nor the less BIC. Is it valid to accept the one or the other classification just by saying that it makes sense and what will be my arguments from the statistical methodology point of view?
 Linda K. Muthen posted on Tuesday, June 06, 2006 - 6:12 am
No, there are guidelines that are discussed in the paper I referred you to and that I mentioned earlier. For example, a low BIC, a p-value greater than .05 for the LRT test, a high loglikelihood, etc. I think you are interpreting the LRT p-value incorrectly. For 4 classes, the p-value is greater than .05. This points to the 3-class solution. The interpretation of the p-value for the LRT is described in the user's guide. In addition, the classes should make sense substantively.
 Irene Biza posted on Friday, June 09, 2006 - 4:31 am
Thank you, I think that I have started to understand.
And one more question concerning Confirmatory Latent Class Analysis. I have decided about the number of classes for each one of my factors and I want to make my model better keeping constant the number of classes. What am I looking to understand if my model is getting better (except of course the fact that classes should make sense)? The bigger Loglikelihood is an adequate criterion or there is something more?
 Linda K. Muthen posted on Friday, June 09, 2006 - 8:31 am
If you are deciding on the number of classes, the same criteria are used: high loklikelihood, low BIC, TECH11, TECH14, etc. If you have settled on the number of classes and are testing nested models, you can do difference testing using the loglikelihood. See the section in Chapter 13 on testing measurement invariance. It also describes using the loglikelihood to test nested models.
 Xiaojing Yan posted on Tuesday, August 22, 2006 - 7:42 am
I am now doing multigroup analysis using mixture modelling. Enjoyed the workshop in Netherland this June very much!

May I have the reference for this please? Many thanks.

"Bengt O. Muthen posted on Monday, January 24, 2000 - 10:29 am

In mixture models, a chi-square test is not provided and therefore the chi-square difference test is computed as 2*d where d is the difference between the loglikelihood values from the two models being compared. The degrees of freedom difference is computed as the difference in the number of parameters."
 Bengt O. Muthen posted on Tuesday, August 22, 2006 - 7:58 am
You can use the book by McLachlan & Peel, listed in the reference section of your June Handout.
 Xiaojing Yan posted on Tuesday, August 22, 2006 - 8:25 am
Fantastic! Many thanks!
 George Deitz posted on Monday, May 07, 2007 - 12:58 pm
I am conducting a CFA mixture model using customer data from two hotel chains. In hotel A, allowing means and variances to differ across classes, a four class solution emerged as optimal, with excellent interpretability.

In trying to run the same model for the hotel B sample, I am having problems getting a 5-class model to converge, even after increasing the number of starts (1000+) and stiterations (20). I've also tried specifying starting values with little success (although I admit I'm not 100% confident I am doing this correctly).

Given this lack of convergence for the 5-class model in the hotel B sample, can I safely draw any inferences regarding the superiority of the four-class solution? If so, is their a published citation for this?

Best Regards,

George Deitz
 Linda K. Muthen posted on Monday, May 07, 2007 - 2:44 pm
It sounds like a five-class solution with class-varying means and variances might be too much for the hotel B sample. I would try holding the variances equal as a first step. If you are unsure about starting values, I would get rid of them as bad starting values can contribute to convergence problems.
 George Deitz posted on Monday, May 07, 2007 - 3:16 pm
The hotel B model for run 5-classes with only means freed to vary, if I have a large number of starts.

However, that begs the question what I should do with the Hotel A sample. Is a model with means and variances freed to vary across classes "better" than one in which only means are allowed to vary? The structure of the classes and the # of members in each class changes substantially, however the four-class solution is optimal in either case.
 Linda K. Muthen posted on Monday, May 07, 2007 - 3:25 pm
Why do you think that the same model should fit the data from the two different hotels?
 Matthew Cole posted on Sunday, December 30, 2007 - 5:46 am
Hi Linda,


If the bootstrapped LRT is < .05 but the VLMR-LRT is > .05, which one should we consider for the GFI?

I would assume that we need to consider this discrepancy in the context of the other GFIs (i.e., AIC, BIC, entropy, etc).

Maybe I'm not running the LRTBOOTSTRAP correctly because I always end up with p values that are < .05 for the parametric boostrapped LRT even when the VLMR-LRT is as high as .85.
 Linda K. Muthen posted on Sunday, December 30, 2007 - 10:37 am
If you are not using Version 5, please download it from Support - Mplus Updates. There have been some changes. I would consider all fit information. TECH11 and TECH14 should not be that different. See the Nylund dissertation on the website. She found TECH14 performed better than TECH11 in some cases.
 Jennifer Wareham posted on Thursday, February 21, 2008 - 8:22 am
I have conducted a CFA mixture model with 4 latent classes. I used the AUXILIARY command to test mean differences across the classes for covariates not included in the analysis. My understanding is that this command allows for equality tests using the posterior probability imputations. Is there a way to test the mean differences for these variables between pairs of the 4 classes? I want to know which classes are significantly different from one another for each varibiable (e.g., age, substance use, offending).
 Linda K. Muthen posted on Thursday, February 21, 2008 - 9:14 am
You can't currently do this is Mplus but it will be added in the next update.
 Jennifer Wareham posted on Thursday, February 21, 2008 - 10:15 am
I don't mean to be a pest, but do you have an estimated date when this update will be available?
 Linda K. Muthen posted on Thursday, February 21, 2008 - 10:18 am
Sometime in April barring any unforeseen events.
 Richard Dembo posted on Thursday, February 21, 2008 - 11:08 am
Thank you, Linda, for your promising words regarding the update you referenced in your 2/21/08, 10:18AM reply to Jennifer Wareham. We plan to hold up completion of our paper until early April-barring any unforeseen events affecting this Mplus update.

As ever, best wishes to you and Bengt!!

Richard Dembo
 Julie Mount posted on Thursday, April 17, 2008 - 3:19 am
I have a 4 class LCA model and I'm interested in exploring the association between membership of each of the 4 classes and a large number of covariates. I'd like to use a stepwise method but don't think this is currently available for multinomial logistic regression within Mplus. I'd thought of using the probabilities of class membership for regression modelling outside of the LCA model but I'm not sure that this is valid (as would assume the same latent class structure across all levels of all covariates?).
If it is valid to do this, I'm not keen to assign cases to the class for which they have highest probability for stepwise logistic regression analyses as would lose the uncertainty around class membership. Would it be reasonable to perform stepwise multiple linear regressions with probability of membership in class c as an outcome? Perhaps after transformation of the probabilities to adjust for non-normal distribution? My sense is no but not entirely sure why. . .
 Bengt O. Muthen posted on Thursday, April 17, 2008 - 9:55 am
No, stepwise regression is not currently included in Mplus.

You can use the fractional class membership in a regression analysis with a nominal observed dependent variable using a setup like

names are u1-u10 x cprob1 cprob2 d;
usevar = x cprob1 cprob2;
classes = c(2);
training = cprob1 cprob2 (probabilities);
Analysis: type = mixture;
c on x;

With high entropy, this wouldn't be too bad in terms of point estimates for c on x. But the SEs will be a bit off (too low) given that c is treated as an observed rather than inferred variable.

Version 5.1 will have a way to investigate potential covariates using c on x regression via "pseudo-class draws".
 Julie Mount posted on Friday, May 09, 2008 - 1:42 pm
Thank you for the helpful response. I'm planning use the extension for covariate analysis in version 5.1 but had a question about this. In version 5 when I examine the model structure by looking at the estimated probabilities of endorsing items the model structure seems to change when I include certain covariates. This could be because I drop cases with missing data but I've been advised that it could also be a more fundamental problem related to marginal homogeneity and the assumption that the measurement structure is similar across all levels of a covariate. Will the new method of investigating covariates be robust to these types of issues in that the covariate is not actually playing a part in the model-fitting process? Thanks for any advice you can give!
 Linda K. Muthen posted on Friday, May 09, 2008 - 3:50 pm
If things change when you add covariates, it probably indicates the need for direct effects from the covariates to the latent class indicators which is related to measurement invariance. You won't see these issues when using the new feature to decide on a set of covariates but once you include them in the model you will. The new feature is meant to help determine which covariates to include.
 Harald Gerber posted on Thursday, May 15, 2008 - 6:16 am
I have several questions regarding covariates in a mixture model (LCGA).

First, does the auxiliary r option available in mplus 5.1 produce the same outcome as using the save=cprobabilities option and doing a multinomial logistic regression with my covariates and these saved class probabilites outside mplus (e. g. SPSS)?

Second, in my model the structure is pretty looking the same with covariates, but nevertheless classes differ with regard to the number of people belonging to them. I had a hard time to solve this problem. starting values from the unconditional model didn't help. Then I fixed the means derived from the unconditional model in my condtional model and class sizes looked nearly the same compared to the uncondtional model. But I've also heard, that you don't recommend such a procedure, but can't find it here in the forum. Could you please state again why such an approach is not recommended?
 Harald Gerber posted on Thursday, May 15, 2008 - 7:07 am
In Addition: The only solution I see at the moment is: to build up the best fitting class model with covariates where I have no missings (treatment and gender) instead of using an uncondtional model and then to save class probabilities and to compute multinomial logistic regressions in SPSS with imputed data i.e. with the rest of my covariates, since mplus isn't able to use auxiliary r in conjunction with imputation.
Would that be o.k. in your opinion?
 Linda K. Muthen posted on Thursday, May 15, 2008 - 5:59 pm
Auxiliary r is not the same as using most likely class membership.

The change in structure when covariates are added is most likely due to the need for direct effects from the covariates to the outcomes.

Or you could just include the covariates in the model which is what you will ultimately do and add direct effects as needed.
 Harald Gerber posted on Friday, May 16, 2008 - 3:16 am
Thank's. And you wouldn't recommend using most likely class membership in further logistic regressions?

There are so many possible direct effects in my model. How to decide on them? Is there any procedure?

Using starting values from the unconditional model (means and thresholds) also stabilizes class memberships in the conditional model. Would that be o.k.?
 Harald Gerber posted on Friday, May 16, 2008 - 3:29 am
sorry,forgot that: why should one not fix the thresholds and means of the classes in the conditional model on the values derived from the unconditonal model? Class membership stabilizes even better as with using only starting values from the unconditional model.
A short answer would be great!
 Linda K. Muthen posted on Friday, May 16, 2008 - 7:49 am
I would not use most likely class membership.

I would look at modification indices (MODINDICES option of the OUTPUT command). To obtain the correct modification indices if you have four outcomes, y1-y4, and three covriates, x1-x3, add the following to the MODEL command:

y1-y4 ON x1-x3@0;

I do not think starting values will alleviate the problem.

Fixing parameters to specified values does not allow the proper solution to be found.
 Harald Gerber posted on Friday, May 16, 2008 - 8:53 am
I also thought of that way but since I'm using numerical integration and imputation modindices are not available.

What would you do, facing that conditions?
 Linda K. Muthen posted on Friday, May 16, 2008 - 9:30 am
Then do one dependent variable at a time, for example, y ON x1-x5;
 Scott posted on Wednesday, May 21, 2008 - 5:11 am
Within growth mixture modeling with a binary mixture indicator (e.g., sex-male vs. female), how does one interpret the "LATENT CLASS ODDS RATIO RESULTS" in the output? For example, the following:

Latent Class 1 Compared to Latent Class 2
Category > 1 22.450 16.058 1.398

Does one interpret this as the odds of males (assuming females='1' & males='2') being in LC 1 is 22 times more likely than males being in LC2? If so, how does one acquire the odds ratio for females?
 Bengt O. Muthen posted on Wednesday, May 21, 2008 - 8:52 pm
The odds ratio

22 = [ P(male)/P(female)|c1 ] / [P(male)/P(female)|c2 ]

so the male to female odds is 22 times higher in c1 than in c2.

If you want the odds ratio for


you can either rescore gender as males=1, females=2, or you can use your output and compute by the formulas:

P(u=1|c) = P(male|c) = 1/(1+exp(threshold))

and get P(female|c) = 1-P(male|c).

Then insert into the odds ratio formula above where male and female have been switched.
 anonymous posted on Wednesday, July 02, 2008 - 4:36 pm
We’ve conducted LCA using survey data from high school and middle school students. In order to get meaningful classes, we stratified by sex and school level. We have different numbers of classes in each strata, but there are several classes that are parallel in each of the strata. We have been looking at emotional outcomes within strata and now we would like to compare the parallel classes between the different strata (e.g. high school boys vs middle school boys, high school girls vs high school boys) by constraining sex or school level. How can we do this when we have 4 classes in one strata and 6 classes in another?

 Bengt O. Muthen posted on Thursday, July 03, 2008 - 4:41 pm
If you have 2 groups with different number of latent classes but you want to check if some of the classes are the same, you can let group be represented by a new latent class variable using the Knownclass option. So then you have 2 latent class variables - a grouping latent class variable and your substantive latent class variable. Your model can then constrain the thresholds for the similar classes of the substantive latent class variable across the 2 categories of the grouping latent class variable. The UG has examples of 2 latent class variables.
 F Lamers posted on Wednesday, August 06, 2008 - 7:42 am
I have some questions regarding LCA. I have a three-class model with 6 binary and 4 nominal indicators. I am interested in the discriminative power of each indicator. Is it possible to get the explained variance or something similar for each indicator? Or should I just use the odd ratios?
My second question is regarding the odds ratios. In the output, the odds ratios are followed by standard errors. I assume that these are not the logit SE’s for the log odds? Is there a way to get the logit SE for the log odds in the output?
 Linda K. Muthen posted on Wednesday, August 06, 2008 - 10:19 am
I would look at odds ratios.

The standard errors following the odds ratios are the standard errors of the odds ratios. The standard errors of the log odds are found in the regular results where the log odds are found.
 linda beck posted on Thursday, August 07, 2008 - 4:42 am
finally I've found two classes (covariates included) in two-part mixture growth modeling.
Unfortunately the entropy (on the output) looks not so good: .68, but the classification matrix seems o.k.:

.88 .12
.09 .91

Would this be a sufficient classification quality from your point of view? Since entropy is often not so high, when having few groups.
In other variations of the model with two groups I found a similar classification matrix (slightly better values) but a much higher entropy at .80. What could be a reason?

regards linda
 F Lamers posted on Thursday, August 07, 2008 - 7:59 am
Linda, thanks for your quick response. I will use the odds ratios then.
I have one more question. For the nominal indicators no odds ratios are given. I have calculated probabilities from the logits for the nominal variables. I noticed that the probabilities I calculated are exactly the same as in an LCA in which I insert the nominal indicators as categorical. Also, classification of cases and BIC values etcetera are exactly the same. Can I assume that the nominal indicators can be considered to be ordinal variables? And can I use the odds ratios + 95%CI intervals from the LCA with the indicators as categorical? It would save me a lot of time in calculating odd ratios...
 Linda K. Muthen posted on Thursday, August 07, 2008 - 9:47 am
For Linda:

The entropy is not that important if you are not planning to classify people using their most likely class membership. If you are working with the model and regressing c on x for example, it should not be a problem.

Less well-fitting models can have better entropy because they may allow no within class variance.
 Bengt O. Muthen posted on Friday, August 08, 2008 - 8:41 am
For Lamers:

Please send your two outputs specifying the indicators as (1) nominal, and (2) categorical to, as well as your license number.
 Seana Golder posted on Tuesday, December 16, 2008 - 7:39 am
I am running a LCA and would like to create an SPSS file that contains the class membership and case id number for each case in my file. I have two questions related to this:

1) I have used the following command : Savedata:

However, when I open the file in SPSS two of my cases are unassigned to a class (which doesn't appear to be the case in MPLUS). What is going on?

2) I would also like to save the id number for each of the cases so that I can match files in SPSS. Is there some way to specify that I also want to save the case ID number in MPLUS?

Thank you in advance for the assistance.
 Linda K. Muthen posted on Tuesday, December 16, 2008 - 7:48 am
To save the id number for each case, use the IDVARIABLE option of the VARIABLE command.

You would need to contact SPSS to see why the data are not being read correctly in that program. The file that is saved is not an SPSS save file. It is an ASCII file.
 Allison Tracy posted on Tuesday, January 20, 2009 - 7:45 am
Happy inauguration Day!

I am trying to do a split sample cross-validation of a mixture model solution. How do I determine if the model replicated successfully in the validation sample (other than an eyeball test of the closeness of the point estimates)? I considered a multiple group model in which the split halves are the groups, holding the parameters equal across groups in one run and freely estimating them in the second run then calculating the 2*d test of significance. Is there another way? The original run had a *very* high estimation load due to covariates and Poisson and categorical distributions in the latent class indicator variables. I am afraid that adding the complexity of multiple groups on top of that will be too much.

Thanks muchly!
 Linda K. Muthen posted on Wednesday, January 21, 2009 - 9:20 am
I would not do the multiple group comparison. I would see if the results look the same as far as class profiles, relationships to covariates, and interpretation.
 devin terhune posted on Wednesday, February 04, 2009 - 9:07 am
Hi. I am running a LPA. With some of my models, the largest class ends up being first and I know this is a problem because that is the one that is excluded for k v. k-1 class solutions.

What function can I include in my model to ensure that my largest class is not the first class?
 Linda K. Muthen posted on Wednesday, February 04, 2009 - 9:32 am
You can use the ending values of the analysis as starting values for the last class in a subsequent analysis.
 devin terhune posted on Tuesday, February 10, 2009 - 1:52 am
Greetings. Thanks for your response. Is it necessary to include the K-1STARTS function in my script in order to do this?

Also, forgive my ignorance, but I don't think the ending values are provided in my output. Do I have to include a particular TECH to get them?

Finally, I am using LRTSTARTS = 10 5 500 200; where do I specify the ending values?

Many thanks for your help!
 Linda K. Muthen posted on Tuesday, February 10, 2009 - 5:51 am
No K-1STARTS is not required.

The ending values are under Model Results. You specify the ending values in the MODEL command. If you have further questions on this topic, please send them along with your license number to
 lisa ibanez posted on Friday, February 27, 2009 - 11:32 am

I am currently running a CFA and Latent path analysis mixture model with two classes. I am confused by the fact that in the mixture output both classes have all the same loading values. I have pasted my syntax below. What should I do to figure out what the uniques values of each class are?
Missing are all(999);


Output: TECH8 TECH11

Missing are all(999);
Cpath ON Ppath;
[Cpath*1 Ppath];
Cpath ON Ppath;
Output: TECH8 TECH11
 Linda K. Muthen posted on Friday, February 27, 2009 - 2:55 pm
The default is to hold the loadings equal across classes. Mention them in the class-specific part of the MODEL command to relax this constraint.
 devin terhune posted on Monday, March 16, 2009 - 1:14 pm
The MPlus output does not provide G2 (Likelihood ratio chi-square) in the output. What function can be used to specify its inclusion? Thanks in advance!
 Bengt O. Muthen posted on Friday, March 20, 2009 - 1:15 pm
Mplus does give this chi-square where it is possible, namely for models where the means, variances and covariances are the sufficient statistics. So not for mixtures, categorical outcomes with ML, or multilevel random slopes models.
 Allison Tracy posted on Thursday, April 16, 2009 - 11:40 am
I would like to compare the estimated conditional probability values for a given response on a multinomial categorical variable across latent classes. I tried doing this by comparing the confidence intervals for the threshold values across classes and found that the thresholds are significantly different from each other. However, when I convert the thresholds to probability scale, I get the same probability across classes. How do I calculate correct standard errors for the estimates in probability scale?

This problem has the added wrinkle of having a covariate effect on the categorical class indicator. I intend to select meaningful values of the covariate in the calculation of the probability of response.
 Linda K. Muthen posted on Friday, April 17, 2009 - 10:41 am
You can use MODEL CONSTRAINT to specify new parameters that represent the probabilities and use MODEL TEST to test if they are the same.
 Anjali Gupta posted on Monday, October 26, 2009 - 2:46 pm

Concerning the auxiliary (r) option - would you suggest using it for a wide array of covariates, then including, in a LCA, only those that meet a certain stat. significance?

My current large set of covariates is making it hard to obtain distinct classes with replicated log liks.

Thank you
 Bengt O. Muthen posted on Monday, October 26, 2009 - 6:35 pm
I would take 3 steps. Using a large set of covariates based on substantive theory, I would in step 1 start with aux(e) to check that a candidate covariate has interesting mean differences across classes. In step 2 I would use the set of covariates found of interest in step 1 and use aux(r) to reduce that set. In step 3 I would include the further reduced set of covariates in the model.
 Sean F posted on Friday, November 06, 2009 - 10:36 am
I apologize in advance if this question has already been answered and I missed it. I have conducted a LCGA analysis that has yielded a 5 class solution. I have also included a grouping variable predicting class. In addition to the multinomial logistic regression output that Mplus provides (e.g., group predicting membership in Class 1 versus Class 2), is it possible to compare the overall effect of group assignment on a particular class versus all other classes combined (e.g.,group predicting membership in Class 1 versus membership in any other class)?

Thank you in advance for your help.
 Bengt O. Muthen posted on Friday, November 06, 2009 - 6:26 pm
You can use Model Constraint to express the probability of any of the latent class categories. Which means that you can look at the estimated odds of being in one class relative to being in any of the other classes.
 Thomas Olino posted on Monday, April 05, 2010 - 6:05 am
I had a similar inquiry, however, extended to a model with a knownclass and latent class variable. I wanted to examine the association between a predictor variable and the latent class variable for each level of the knownclass variable.

i s | y1@0 y2@1 y3@2 y4@3 y5@4 y6@5
c ON KC;
c#1 c#2 c#3 ON x1-x4;
c#1 c#2 c#3 ON x1-x4;
c#1 c#2 c#3 ON x1-x4;

I thought this would produce some gender specific output, however, I didn't see it.

Any thoughts would be greatly appreciated!
 Linda K. Muthen posted on Monday, April 05, 2010 - 8:34 am
Please send the full output and your license number to
 Anonymous posted on Friday, April 23, 2010 - 10:50 am
I recently came across an article referring to latent growth mixture models using a zero-inflated Poisson. If you have count data with several zeros do you need to use ZIP? I was under the impression these individuals would be collapsed into 1 class so this would not be necessary, am I wrong?
 Linda K. Muthen posted on Saturday, April 24, 2010 - 11:56 am
You don't necessarily need inflation. You can run the model both ways and see if including inflation improves to model fit by looking at BIC for example.
 Tamara Arenovich posted on Monday, April 26, 2010 - 11:34 am
I am performing an LCA with 7 continuous indicators representing unique (but potentially correlated) personality domains. I also have a binary covariate (sample). The BIC suggests a 2-class solution is best. When examining this solution, I noticed a few strange things in the output that I cannot explain:

1. The output provides an odds ratio examining the effect of 'sample' on class membership. In my solution, class 1 consists of subjects from both samples, but class 2 contains subjects from sample 2 only. Why did it provide me with an OR instead of a warning/error message? What does it mean in this case?

2. I want to know whether the classes differ in their mean scores across the 7 indicators. I used the MODEL TEST command to fix the mean personality score of class 1 to the mean of class 2 for each of the 7 domains one at a time. This worked for 6 of them, but provided no results whatsoever for the 7th one. I'm not sure why. Of the 6 that did work, the results were highly significant (p<0.0001 in each case). However plots suggest that the groups are quite similar across 3 of the 7 dimensions. I then tried MODEL CONSTRAINT instead. I compared these results to the unrestricted model using LRTs and these findings made more sense. I am not sure why these two sets of findings differ so dramatically and I do not know which set of results I should trust. Any help you can provide is greatly appreciated!
 Linda K. Muthen posted on Monday, April 26, 2010 - 12:12 pm
Please send the outputs and your license number to It is not clear what you are seeing. Note that MODEL TEST is not used to constrain parameters. It computes a Wald test. MODEL CONSTRAINT is.
 Laurie Corna posted on Friday, April 30, 2010 - 8:06 am
I have recently read that an index of dissimilarity (D) provides an additional indicator of model fit for comparing various class solutions in LCA in addition to other model selection criteria. Is it possible to request this in Mplus? Or calculate it from the output?

Many thanks.
 Linda K. Muthen posted on Saturday, May 01, 2010 - 4:13 pm
I am not familiar with the index. It is not available in Mplus.
 Chester Kam posted on Thursday, July 08, 2010 - 2:53 pm
When I read the Mplus manual (p.654), it says that TECH14 (Bootstrapping) "compares the estimated model to a model with one less class than the estimated model... The model with one less class is obtained by deleting the FIRST(capitalization added) class in the estimated model. Because of this... it is recommended when using starting values that they be chosen so that the last class is the largest class."

Let's say I am comparing a 5-class solution with a 6-class solution. I wonder if I need to specify the first class as the new class found in the 6-class solution, so that TECH14 can compare whether the addition of that particular new class is necessary?

Thanks so much!

 Chester Kam posted on Thursday, July 08, 2010 - 3:02 pm
Just to supplement my previuos post:

Do I need to do anything extra, besides putting "TECH14" in the output option, to ask Mplus to conduct bootstrapping model comparison?

Thanks again!

 Linda K. Muthen posted on Friday, July 09, 2010 - 10:06 am
No, just put TECH14 in the OUTPUT command.
 Miles G Taylor posted on Wednesday, December 15, 2010 - 11:47 am
Hi Linda and Bengt,
I'm modeling an LCA of continuous-ish health indicators and want to use cluster to predict a hazard of death. I'm also controlling on age: cluster membership and hazard. Age is assumed to be constant in its effects on death across classes. My code looks like this:

USEVARIABLES ARE disg82 disg84 disg89 disg94 disg99 mortw2 mortw3 mortw4 mortw5 age82;
CATEGORICAL ARE mortw2 mortw3 mortw4 mortw5;
CLASSES = c (6);

STARTS = 200 5;

f BY mortw2-mortw5@1;
f ON age82 (1);
c#1 c#2 c#3 c#4 c#5 ON age82;
f ON c#1 c#2 c#3 c#4 c#5;

My first question is (1) whether my code is correct and in particular regarding the clusters predicting the hazard. I do not get effects of f on c#1, etc. in the output but rather an intercept of f for each cluster except the referrent (c#6). Do I treat these intercepts as effects of each c on f? When I exponentiate them they produce viable values for hazard odds ratios but I want to make sure I haven't interpreted these or coded them incorrectly.
 Miles G Taylor posted on Wednesday, December 15, 2010 - 11:59 am
My second question is clarification on how Mplus treats the clusters as outcomes. In my output I get effects listed as "c#1 ON age82", etc. Am I correct that this is a logistic regression coefficient that refers to age's effect on c#1, etc. with the last cluster (c#6) as the reference group? I'm used to working with another software and I want to make sure I'm correct.

 Linda K. Muthen posted on Wednesday, December 15, 2010 - 5:10 pm
1. You are correct.
2. You are correct.

Just remove

f ON c#1 c#2 c#3 c#4 c#5;

from the MODEL command.
 Miles G Taylor posted on Thursday, December 16, 2010 - 8:12 am
Thanks Linda!
 Lena Herich posted on Sunday, March 20, 2011 - 7:04 am

I am trying to estimate a latent class analysis with one continous covariate. Now I want to investiage whether there is differential item functioning. Can I can do this using Mplus? The model is:

Names=u1 u2 u3 u4 age;
Categorical=u1 u2 u3 u4;

Classes= c(2);

Analysis:Type=Mixture missing;
Starts = 50 10;
c#1 on age;
 Linda K. Muthen posted on Sunday, March 20, 2011 - 9:53 am
Yes, differential item functioning can be seen by regressing a latent class indicator on a covariate.
 Lena Herich posted on Monday, March 21, 2011 - 8:56 am
Thank you very much for your fast answer!

I have another question regarding the same model, now with 4 classes. I added another couple of covariates and I am investigating if they have an influence on the definition of the latent classes. The result is that some covariates are only significant predictors for one class, some for more than one and some for none.
How do I determine the covariates which should stay in the model. Can I remove only those which do not have an influence in any class? Should I use backward selection?
 Bengt O. Muthen posted on Monday, March 21, 2011 - 5:06 pm
I would keep all of them - at least all that are significant for some class - and simply report the results, some significant and some not.
 J.D. Haltigan posted on Wednesday, April 06, 2011 - 10:22 pm

I am performing an LCA with 7 binary indicators. I also have a related continuous variable that I have the option of including as a latent class predictor. I realize this may be a very basic question but is there a way to determine whether the addition of the continuous indicator improves a given group solution over only the 7 binary indicator model? Would this simply involve a comparison of the BIC values for each model?

Thank you.
 Linda K. Muthen posted on Thursday, April 07, 2011 - 7:43 am
You cannot compare BIC when you have a different set of variables. You can see if classification quality improves by looking the the classification table and entropy.
 Renee McDonald posted on Monday, August 08, 2011 - 10:29 pm
I am doing a factor mixture analysis and have a question. If I simulate the data internally and specify 2 classes, for example, how does Mplus separate or differentiate those two classes? Is it by factor means? Thank you.
 Linda K. Muthen posted on Tuesday, August 09, 2011 - 9:27 am
Any parameter that differs between the classes will be involved. The major parameters are the means of the categorical latent variable which are logits related to class probabilities.
 Daniel Rodriguez posted on Thursday, December 15, 2011 - 12:57 pm
Good afternoon,
when doing a mixture model, like a growth mixture model, one can have many comparisons among the groups. Does Mplus automatically adjust the p-value for multiple comparisons to maintain some overall p-value like .05?
 Linda K. Muthen posted on Thursday, December 15, 2011 - 2:23 pm
No. You would need to be conservative in your interpretation of the results to take this into account.
 Daniel Rodriguez posted on Friday, December 16, 2011 - 6:47 am
Thank you
 Zachary Hamilton posted on Wednesday, February 15, 2012 - 4:19 pm
Sorry for such a simplistic question. During the short course videos and in several discussion posts it is mentioned that, when using the LMR or Bootstrap LMR, one should put the largest class last. Specifically it is mentioned that you can reorder classes by using the ending values as starting values. So if you want Class 2 to be Class 1, use the ending values of Class 2 as starting values for Class 1. Where do I find these values in the output and how would they be included in the input that would allow me to reorder the classes?
 Linda K. Muthen posted on Wednesday, February 15, 2012 - 4:38 pm
They are in the results after the heading MODEL ESTIMATION TERMINATED NORMALLY. An easy way to do this is to use the SVALUES option of the OUTPUT command. This provides input statements with starting values. You can just change the class labels and use this as input.
 EFried posted on Sunday, February 26, 2012 - 8:39 am
Adding covariates into my GMM, the Intercept and Slope Growth Factor means aren't displayed in the model results where they are usually displayed.

So ... where are they? The model is y0@0 y1@1 y2@2 etc, so the intercept growth factor mean should be the starting point on the y-axis of the plot at x=0, right?

I find values in the model result under "Intercepts", but the values there have no connection to the observed or estimated intercepts of the classes that the plots show.

Thank you!
 Linda K. Muthen posted on Sunday, February 26, 2012 - 8:51 am
When you regress the intercept and slope growth factors on a set of covariates, intercepts, not means, are estimated. If you want to see what the means are, ask for TECH4 in the OUTPUT command.
 lisa jobe-shields posted on Sunday, April 15, 2012 - 6:50 am
I am re-running my solutions (1-8 class solutions, LVMM with categorical and continuous indicators) to ensure that my LMR-LRT and BLRT are comparing the K to K-1 solutions that are correct. A few times now, it has been apparent that class 1 is not the new class (i.e., I would need to go in and use start values to change the order)- but, the H0 loglikelihood in the tech output is the same value as from the correct k-1 model. Is there any reason to expect this? I was under the impression that I could use this as a check regarding the correct K-1 class model. Could it be indicative of a different problem? The models appear stable and are warning-free. Thank you so much.
 Linda K. Muthen posted on Sunday, April 15, 2012 - 5:04 pm
It's enough that the loglikelihoods agree. The classes do not need to be in the same order.
 Thomas Olino posted on Wednesday, May 09, 2012 - 9:20 am
Is it possible to specify that auxiliary variables are categorical in the context of mixture models? Or, are all comparisons made on auxiliary variables based on assuming that they are continuous?

 Linda K. Muthen posted on Wednesday, May 09, 2012 - 10:05 am
They are treated as continuous. Usually these variables are covariates which are treated as continuous in regression.
 Jerry Cochran posted on Friday, June 22, 2012 - 12:29 pm
Hi there,

In LCA, is there a difference for the conditional item probabilities estimates between classes based on posterior probabilities compared to classes based on most likely class membership? Or, are the probabilities the same regardless?

Thank you.
 Bengt O. Muthen posted on Friday, June 22, 2012 - 8:31 pm
The conditional item probabilities are based on the model estimates (the logit estimates), not on posterior probs and most likely class.
 sojung park  posted on Sunday, July 15, 2012 - 8:01 pm
could I ask you seemingly basic question?

I am reading a tutorial material on LCA with Mplus, and at the end it says

"The file saved with Savedata command contains the score of each observation on each indicator, the identification number, --I opened this file in a text editor and dropped all the variables except the identification number and the class, C. I repeated this analysis for wave 4 and did the same. I then merged these two datasets with the data for the growth curve analysis. "

The "growth curve analysis" s/he refers here is not LCGM/GMM.. right? While I do not seem to think it is the same approach, then I wonder how exaclty it would be different..

thank you SO much!
 Linda K. Muthen posted on Monday, July 16, 2012 - 8:12 am
It is not clear what the author means by that paragraph. You should ask the author.
 Justin D. Smith, Ph.D. posted on Saturday, August 11, 2012 - 3:55 pm
This is probably going to seem like a very basic question but here goes:

I am trying to run a 3 group moderation analysis using the KNOWNCLASS option due to a censored dependent variable. Using the %Overall% command I seem to get the constrained model. I need to free each group so I can compute a likelihood-ratio chi-square difference test to see if there are group differences. What additional codes do I need to do this and am I even doing this correctly?

Thanks in advance!

Variable: Names are id FB FB2 Ethn EthGrp Tx Gen Study FC1 PFR1 AB1 SU1 Trisk1
FC2 PFR2 AB2 SU2 Trisk2 FC3 PFR3 AB3 SU3 Trisk3 FC4 PFR4 AB4 SU4 Trisk4;

Usevariables are Tx Gen Study SU1 SU4;

Missing are all (-999.00);
KNOWNCLASS = EthGrp (EthGrp = 1 EthGrp = 2 EthGrp = 3);
CLASSES = EthGrp (3);

Estimator = ml;

SU4 on SU1;!(a);
SU4 on Tx;!(b);
SU1 with Tx;!(c);
SU4 on Gen;!(d);
SU1 with Gen;!(e);
SU1 with Study;!(f);
SU4 on Study;!(g);

sampstat standardized;
 Linda K. Muthen posted on Sunday, August 12, 2012 - 9:20 am
You should not label the ON statements in the overall part of the MODEL command. This holds them equal across classes which is the default. You want to mention the ON statements in the class-specific parts of the MODEL command, for example,


SU4 on SU1 (p1);

Give different labels in each class and use MODEL TEST to do a Wald test. In mixture modeling, you don't want to use equal versus unequal analyses to do difference testing. This can change class membership.
 J.D. Smith posted on Tuesday, August 14, 2012 - 8:00 am
Thank you Linda. A quick follow up question: The Wald test was found to be significant and now we need to determine the path(s) that differ between the classes. How do we free up individual parameters within each group?
 Linda K. Muthen posted on Tuesday, August 14, 2012 - 6:49 pm
They are free when you mention them in the class-specific parts of the MODEL command as shown above.
 Lucy posted on Tuesday, November 13, 2012 - 5:16 pm
I am running an LPA model on four items. I wish to compare a number of different k to determine what the optimal number of classes is. When I run the bootstrap LRT, it says that the results are not replicated and to increase the number of starts using LRTSARTS. I have done this (progressively increasing the number of starts), and I still cannot get the bootstrap LRT to be replicated in all runs. Is it OK to just go by the LMR from TECH11 instead? Or does the failurr to replicate the Bootstrap LRT indicate a more serious problem with the model?
 Linda K. Muthen posted on Tuesday, November 13, 2012 - 5:32 pm
See Web Note 14 on the website for information on how to use TECH11 and TECH14. If this does not help, using TECH11 alone is fine. Having a problem with TECH14 does not indicate a serious problem for the model.
 Mary E. Mackesy-Amiti posted on Thursday, February 07, 2013 - 8:53 am
When using AUXILIARY with the (R) option in a LCA, are the covariates tested one at a time or simultaneously?
 Linda K. Muthen posted on Thursday, February 07, 2013 - 11:30 am
They are tested simultaneously.
 Linda K. Muthen posted on Thursday, February 07, 2013 - 12:41 pm
You might want to consider the new R3STEP option in Version 7. It supercedes the R option.
 Christian M. Connell posted on Friday, June 14, 2013 - 2:38 pm
I am running an LTA model with auxiliary variables included to assess mean differences across transitional classes. Previously, I was able to run this model and obtain output. However, in attempting to re-run these models I now get a warning indicating "Auxiliary variables with E, R, R3STEP, DU3STEP, or DE3STEP are not available with TYPE=MIXTURE with more than one categorical latent variable."

This occurs even when I try to re-run a copy of a previously working model.

Two questions:
1) has something been altered to prevent this type of model?
2) Is the alternative to set-up Wald tests?
 Linda K. Muthen posted on Sunday, June 16, 2013 - 4:00 pm
As far as we know, these AUXILIARY options have never been available for more than one categorical latent variable. Please send the output where you did this and your license number to
 Rebecca Madill posted on Thursday, June 27, 2013 - 9:54 am
I am comparing 6-class model to a 7-class model using TECH 11 (N=2471). Following Webnote 14, I first ran each model without TECH 11 to ensure the best log likelihood is replicated. For the 6-class model, the best log likelihood was replicated with 'starts=500 100' and 'starts=1000 200.' For the 7-class model, the best log likelihood replicated with 'starts=20000 5000.'

When I ran the 7-class model with TECH 11 (using OPTSEED), the K-1 log likelihood (-39267.167) was much larger than for the 6-class model I had previously run (-39523.778).

I re-ran the 6-class model with starts=20000 5000 and got these log likelihoods and seeds:

-39283.697 463270 5675
-39523.778 831086 10102
(-39523.778 continues)

Then with starts=40000 10000:

-39241.994 549625 26815
-39283.697 463270 5675
-39523.778 804903 5305
(-39523.778 LL continues)

My questions are

1) Should I keep trying more starting values for the 6-class model to see if the log likelihood replicates -39267.167? Or should I assume that extracting 6 classes from the data is too much?

2) If the latter is true, can I conclude that the 5-class model is best? (Models 2-5 have a significant LMR and VLMR.)
 Bengt O. Muthen posted on Thursday, June 27, 2013 - 1:56 pm
You should first increase Starts in your 6-class model to see if you can replicate the best 6-class LL so far: -39241.994.
 Rebecca Madill posted on Saturday, June 29, 2013 - 9:29 pm
Thank you. I ran starts=200000 50000 and got this:
-39241.994 549625 26815
-39267.167 353583 114407
-39283.697 463270 5675
-39523.778 831086 10102
(-39523.778 continues)

This model took 15 hours to run. How do I know when to stop increasing the number of starts and accepting an answer? When I do stop, what is the correct way to compare other models to the 6-class model to determine best number of classes?
 Bengt O. Muthen posted on Monday, July 01, 2013 - 3:51 pm
The fact that you are having this much trouble replicating the best loglikelihood is in itself an indication that maybe you are trying to extract too many classes.

I tend to first look at BIC and see for how many classes it has a minimum, and only then look at Tech11 and Tech14 for number of classes around that number.
 LJ posted on Friday, August 16, 2013 - 8:41 pm
I am trying to examine how latent class membership could moderate people's evaluations of regime. I have four binary indicators for the latent membership and four binary indicators for regime evaluations (modeled as a latent factor). I have also three continuous covariates for the latent factor. Here is the syntax:

NAMES ARE uid wt sid psuid pt1-pt6 rt1-rt4 enow eret epro;
USEVAR = pt1-pt3 pt5 rt1-rt4 enow eret epro;
CLASSES = c(2);
MISSING = All (-999);
WEIGHT is wt;
CLUSTER is psuid;
CATEGORICAL ARE pt1-pt3 pt5 rt1-rt4;

TYPE=mixture complex;

f1 by rt1* rt2-rt4;
f1 on enow eret epro;

f1 on enow eret epro;

f1 on enow eret epro;

My questions are:
1) Is the model specification correct in testing different intercepts for "f1" between the two groups?
2) Is the model specification correct in testing the varying associations between "f1" and "enow eret epro"?
3) In addition to "p1-p3 p5", are other variables also involved in the classification of latent membership?
 Linda K. Muthen posted on Sunday, August 18, 2013 - 10:33 am
1. Yes except in class one change [f1]; to
[f1@0]; Factor means must be zero in one class for identification.

2. Yes.

3. Yes, all observed variables are involved.
 Adar Ben-Eliyahu posted on Wednesday, November 27, 2013 - 6:56 am
Dear Dr. Muthen,
I ran a latent class and found that 3 classes fit the data best. Now I'd like to compare the means of the 7 items used to run the latent classes. That is, a pairwise comparison between the means on item 1 across the 3 groups (class 1 vs. 2, 2 vs. 3, and 1 vs. 3). And the same for each of the 7 items.
Is there a way to do this in MPlus or should I read out the file to another software (like SPSS) and run a MANOVA ?
Thank you!
 Bengt O. Muthen posted on Wednesday, November 27, 2013 - 9:06 am
 Adar Ben-Eliyahu posted on Thursday, November 28, 2013 - 5:29 am
Thank you Dr. Muthen. However, using the MODEL TEST command seems to add the significances of the item value, but not in comparison across classes...or have I missed something?
thanks you.


Estimate S.E. Est./S.E. P-Value

Latent Class 1

ZQ911A1 -1.303 0.254 -5.122 0.000
ZQ911A3 -0.681 0.101 -6.731 0.000
ZQ911A4 -1.254 0.172 -7.306 0.000
ZQ946A1 -0.783 0.084 -9.305 0.000
 Bengt O. Muthen posted on Thursday, November 28, 2013 - 8:07 am
You are showing MODEL RESULTS. Instead, look up MODEL TEST in the UG Index.
 Adar Ben-Eliyahu posted on Thursday, November 28, 2013 - 5:29 pm
I'm sorry, I don't know what UG Index refers to...couldn't find it in the manual.thank you
 Linda K. Muthen posted on Thursday, November 28, 2013 - 6:05 pm
The Index starts on page 835 of the Mplus User's Guide.
 þeyma posted on Monday, July 07, 2014 - 2:49 am
Dear Muthen,

I did a multilevel mixture item response analysis in mplus. I estimate the item diffuculties) across levels and classes. But I want to detect DIF across school and class levels.
How can I use the parameters for this reason?
The parameters are across 1-1;1-2;2-1 and 2-2 classes. When I want to detect school DIF I subtract the parameters from 1-1 and 1-2 for guessing the "school 1" parameters.
Is it right or should I use another way for detect DIF?
I generate the data in another program and I know which items include DIF across classes.

My input file is so:

CLASSES = C(2) L(2);
CLUSTER = School;
DATA: FILE = 10.run1.koþul.dat;
f BY M1-M20* (1);
 Bengt O. Muthen posted on Monday, July 07, 2014 - 8:29 am
For each item you have 4 thresholds the way your model is specified. That is, you have main effects of C and of L as well as effects of their interactions. By labeling, any threshold difference that you are interested in can be expressed in Model Constraint so that you get the SE of this difference. If you don't want interaction effects you should use Model C: and Model L: to express the thresholds - or impose the appropriate equality restrictions in your "dot" setup.
 Andrea Norcini Pala posted on Wednesday, October 15, 2014 - 10:10 am
Dear Professors,
I am running a LPA with 8 dependent variables (continuous on a 10-point scale), and I have found 4 profiles. Now I am interested in comparing the 4 profiles on the 8 dependent variables, but I am not sure how can I run a formal test.
Thank you,
 Bengt O. Muthen posted on Wednesday, October 15, 2014 - 2:33 pm
You can use Model Test. For instance, you may want to test that the 8 means for class 1 are the same as the 8 means for class 2.
 xybi2006 posted on Tuesday, January 20, 2015 - 2:23 pm
Dear Dr. Muthen,
I ran a growth mixture model using 4-wave data and got 3 classes. Can I compare the mean for each class by each time point in MPLUS? If so, how? Or should I export the data and then do the ANOVA test to compare the mean of the 3 class for each time point? If so, how? I tried to use the SAVEDATA, but it did not give me the estimated values for individuals for all time points, only the estimated values for class intercepts and slopes.
Thank you,
 Bengt O. Muthen posted on Tuesday, January 20, 2015 - 5:00 pm
You can use Model Constraint. You use model parameter labels to express the outcome means at each time point and each class. Then you create differences between these means and this will give you a test of whether these differences are zero.
 xybi2006 posted on Tuesday, January 20, 2015 - 11:48 pm
Thanks much!
Not sure whether my codes follow your instruction above. Here are my codes for the constraint model. Do they look right to constrain "med" at T1 to be equal between class 1 and class 2? Also, compare with the freely estimated model, do the class sizes change in the constraint model?

I S | med1@0 med2@1 med3@2 med4@3;

[med1] (1);
[med1] (1);

One more question: The plot that Mplus created for growth mixture model is based on the posterior probabilities or not.
 Bengt O. Muthen posted on Wednesday, January 21, 2015 - 9:20 am
It looks like you didn't understand what I suggested. First, you need to be able to express means of outcomes in growth models. For that, see Topic 3 of our course video and handout. Second, you need to be able to use Model Constraint. For that, see for instance UG ex 5.20.

The [med1] parameters in your input are intercepts of the outcomes and are fixed at zero in a growth model. They are not the means of the outcomes.

The growth mixture plot is based on the estimated model, not the posterior probabilities.
 xybi2006 posted on Thursday, January 22, 2015 - 2:33 pm
Dear Dr. Muthen,
Thanks much for your information. Now, I understand how to do that in MPLUS.
If I export the graph data of the growth mixture model (based on the estimated model) and then use ANOVA in other statistical package (e.g., SAS) to compare the mean difference among classes, is it still OK?
Thanks again,
 Bengt O. Muthen posted on Thursday, January 22, 2015 - 4:19 pm
No, that won't give you the correct tests because the standard error computations in that approach will be built on the wrong assumptions. So Model Constraint is the way to go - you'll find it useful to have mastered it.
 Shraddha Kashyap posted on Thursday, January 22, 2015 - 7:53 pm
Dear M&M,

After using an LGCA to examine the number of classes of patients who change in symptoms during treatment, followed by logistic regression in SPSS to predict a distal outcome (self-injury yes/no)--the difference between the classes with the highest and lowest number of self-injury incidents was statistically significant.

However, when using GMM including a covariate and the distal outcome (self-injury), while the differences in occurrence of self-injury between most classes were significant (based on ODDS RATIOS), the difference in self-injury rates between the classes with the highest and lowest number of self-injury incidents was not significant.

Any suggestions why this might be?


 Shraddha Kashyap posted on Thursday, January 22, 2015 - 8:07 pm
Just as a follow up and to check if my question makes sense, these are the stats that I am referring to, am I interpreting them correctly? (INC_YN = self-injury yes/no)

Latent Class 1 Compared to Latent Class 2
Category > 1 13.804 11.433 1.207 0.227

Latent Class 1 Compared to Latent Class 3
Category > 1 0.241 0.069 3.496 0.000

Latent Class 1 Compared to Latent Class 4
Category > 1 1.868 0.592 3.153 0.002


 xybi2006 posted on Friday, January 23, 2015 - 9:15 am
Dear Dr. Muthen,
Thanks for your help!
I did the model constraint to test whether the means are different across latent classes. Below is my code. Am I in the right track? Should I constrain the parameters as equal beginning with 0= (for my example, 0 = meanc1t1 - meanc2t1)? If so, then should I do the deviance test to compare the freely estimated model with the constrained model?
Thanks much for your time!

[I] (Ic1);
[S] (Sc1);

[I] (Ic2);
[S] (Sc2);

model constraint:
new(meanc1t1 meanc1t2 meanc1t3 meanc1t4
meanc2t1 meanc2t2 meanc2t3 meanc2t4);

! class 1:
meanc1t1 = Ic1;
meanc1t2 = Ic1 + Sc1*1;
meanc1t3 = Ic1 + Sc1*2;
meanc1t4 = Ic1 + Sc1*3;

! class 2:
meanc2t1 = Ic2;
meanc2t2 = Ic2 + Sc2*1;
meanc2t3 = Ic2 + Sc2*2;
meanc2t4 = Ic2 + Sc2*3;

0 = meanc1t1 - meanc2t1;
!0 = meanc1t2 - meanc2t2;
!0 = meanc1t3 - meanc2t3;
!0 = meanc1t4 - meanc2t4;
 Bengt O. Muthen posted on Friday, January 23, 2015 - 11:04 am
Answer to Kashyap:

LCGA and GMM give quite different class formations. You should decide on which one you want based on BIC and substantive interpretations.

Also, for your LCGA you use a 3-step approach whereas for GMM you use a 1-step approach - plus you add a covariate.
 Bengt O. Muthen posted on Friday, January 23, 2015 - 12:19 pm
Answer to xybi2006:

This looks correct, except instead of saying

0 = meanc1t1 - meanc2t1;

you want to say

diff = meanc1t1 - meanc2t1;

where diff is listed within NEW. You don't need deviance testing in this case because diff will be given a z-test.
  Chris Kenaszchuk posted on Monday, January 26, 2015 - 10:54 am
Hi. In the following input I would like the latent profiles to be based on the indicators u1-u7. Will it correctly obtain the mean of ywv2 for each %CW#n%? Or, does it cause ywv2 to be an indicator for each profile? (Which is not what I want.) Thanks.



ywv1 ywv2 u1 u2 u3 u4 u5 u6 u7
w1 w2 wgtwv2;

WEIGHT = wgtwv2;
CLUSTER = nbrhds;
CLASSES = CB(2) CW(4);
BETWEEN = CB w1 w2;
WITHIN = ywv1 ywv2
u1 u2 u3 u4 u5 u6 u7 ;

STARTS 20 10;

CW ON ywv1 ;

CW ON w1 w2;


[u1 u2 u3 u4 u5 u6 u7];
[ywv2] (TMS1);
[u1 u2 u3 u4 u5 u6 u7];
[ywv2] (TMS2);
[u1 u2 u3 u4 u5 u6 u7];
[ywv2] (TMS3);
[u1 u2 u3 u4 u5 u6 u7];
[ywv2] (TMS4);

onetwo onethree onefour
twothree twofour
threefour ;
onetwo = TMS1 - TMS2;
onethree = TMS1 - TMS3;

 Bengt O. Muthen posted on Monday, January 26, 2015 - 4:04 pm
The classes will be based on the ywv2 means as well unless you hold these equal across classes.

If you don't want them equal, you need 3-step technology, which hasn't been developed for twolevel.
 Shraddha Kashyap posted on Monday, January 26, 2015 - 7:20 pm
Ok, thanks!
 Katrin Mägi posted on Tuesday, March 03, 2015 - 4:13 am
Dear Dr Muthen,

In step 3 of mixture modelling I want to compare the intercepts of several continuous distal outcome variables (while adjusting for several predictor variables) across 5 latent profile groups. Should I use



MODEL TEST: 0=m1-m2;


NEW (diff);

I see that the results are essentially the same and Model test gives Wald test, but what is the difference between the two?

Thank you!
 Linda K. Muthen posted on Tuesday, March 03, 2015 - 8:22 am
The test of diff in MODEL CONSTRAINT is a z-test. The Wald test and z-test are asymptotically equivalent.
 Katrin Mägi posted on Friday, March 06, 2015 - 2:35 am
Thank you, Linda!
I have one more question regarding my group comparisons. I have 4 continuous and 1 categorical (gender; 0=boys, 1 = girls) predictor variables. I understand the intercept of the distal y variable is mean of y while all the x variables are 0. As 0 of my categorical variable represents boys, does it mean that the differences in y intercepts across the latent profile groups represent difference between boys mean scores on y while all the other x are 0 (which is at mean level as I've standardized the continuous x variables).

If this is the case, would it make sense to reverse my categorical variable then, to get differences also for girls?

Thank you!
 Bengt O. Muthen posted on Friday, March 06, 2015 - 8:38 am
I think your predictor variables influence only the latent class variable, right?. If so, the y intercepts are in a different regression relationship (y on c, essentially, no c on x), so those y intercept differences across classes are for everyone, not just boys or girls. This is a reflection of the measurement invariance that you specify when you don't have direct effects from the predictors to the latent class indicators.
 Katrin Mägi posted on Friday, March 06, 2015 - 1:57 pm
That's correct that I don't have direct effects from predictors to the latent class indicators. However, I do have direct effects from predictors to distal outcomes. I'll try to be more specific about my goal. I have found 5 self-regulation profiles. Now I would like to investigate: 1) if academic pre-skills and gender predict class membership 2) if the five self-regulation profile groups differ in their later academic skill level while pre-skills are controlled for (and since two of the later academic skill variables show gender differences I've also included y on gender). But I'm unsure if the model I've specified makes sense and how should I interpret the intercept of y variables.
I have specified:

c on gender x1 x2 x3 x4;
y1 on gender x1 x2 x3 x4;
y2 on gender x1 x2 x3 x4;
y1 on gender x1 x2 x3 x4;
y2 on gender x1 x2 x3 x4;
y1 on gender x1 x2 x3 x4;
y2 on gender x1 x2 x3 x4;

Thank you again!
 Bengt O. Muthen posted on Saturday, March 07, 2015 - 12:48 pm
Ok, so your y's are later academic skill level variables and gender influences those.

So then your y intercepts are for boys and to get them for girls you simply use Model Constraint:

girlint11 = boyint11+b11;

where boyint11 is the label for [y1] in class1 and b11 is the slope for y1 on gender (girl) in class 1. Etc for the other classes and the other y's.

Note also that y on gender x1-x4 in the Overall is not identified along with the same regressions in each class. So you can say in the Overall

y on gender@0 x1-x4@0;
 Janice Kooken posted on Friday, April 03, 2015 - 9:40 am
I tried to run a two level growth mixture model with the auxiliary option, and I received the following error:
are not available with TYPE=MIXTURE TWOLEVEL.

Is there a workaround possible? I am seeking evidence for validity of classes by testing if class membership predicts a distal outcome.
Thank you.
 Tihomir Asparouhov posted on Friday, April 03, 2015 - 12:15 pm
The 3 step and BCH methods have not been developed and used yet for TYPE=TWOLEVEL MIXTURE.
 Janice Kooken posted on Friday, April 03, 2015 - 1:10 pm
Thank you for your quick reply, and I have a clarifying question. Does your reply mean that this has not been developed for Mplus or that that mathematics has not yet been developed to use latent class to predict distal outcomes in a multilevel GMM?
 Tihomir Asparouhov posted on Friday, April 03, 2015 - 3:05 pm
As far as I know the mathematics has not yet been developed yet for multilevel mixture models.
 Riley Steiner posted on Monday, April 27, 2015 - 8:23 am
I am conducting a multi-group LCA using complex survey data and trying to decide between a 5-class and 4-class model. Tech11 and Tech14 won't run. Aside from AIC/BIC and interpretability, how would you suggest I compare the models? Thank you!
 Bengt O. Muthen posted on Monday, April 27, 2015 - 11:13 am
If you have categorical outcomes you can also check TECH10 bivariate fit statistics.
 Janice Kooken posted on Wednesday, May 06, 2015 - 7:38 pm
Although Auxiliary=Variable(E) works with multilevel Growth Mixture Models, Webnote 21 states that it is no longer recommended except for methodological research. What other choices are available using Mplus to examine the strength of class as a predictor of a categorical distal outcome? I may use logistic regression with the output from savedata, but I have not found research that suggests whether it is best to use most likely class or class probability as the predictor. Could you suggest studies or information on Statmodel that would provide some insight here? Thank you.
 Bengt O. Muthen posted on Thursday, May 07, 2015 - 9:48 am
WN 21 Table 6 recommends DCAT.
 Janice Kooken posted on Thursday, May 14, 2015 - 1:27 pm
Thank you. Unfortunately, it does not appear that DCAT works with multilevel based upon the following error message.

*** ERROR in VARIABLE command
are not available with TYPE=MIXTURE TWOLEVEL.
Are there other options? Thank you.
 Bengt O. Muthen posted on Thursday, May 14, 2015 - 1:38 pm
Sorry; 3-step hasn't been developed for multilevel.
 Janice Kooken posted on Thursday, May 14, 2015 - 3:29 pm
Yes, thank you Dr. Muthen. Dr. Asparouhov advised me of this earlier. My question of May 6th concerns what other options I might have. I will likely use Auxiliary E and also logistic regression using the most likely class and compare results. My concern was that your WN21 stated that E should no longer be used except for methodological studies.
 Tihomir Asparouhov posted on Friday, May 15, 2015 - 9:03 am
Auxiliary E would be ok if the entropy is fairly large such as above 0.85. I would recommend using the 1-step approach, meaning include the binary variable in the model.

Auxiliary = x(e); is based on pseudo class draws.

Regarding the performance of the most likely class as the predictor take a look at
 andy supple posted on Wednesday, June 24, 2015 - 9:33 am
I wanted to compare continuous outcomes across classes and control for covariates (their association with the outcomes). The original model had entropy = .86 so I hard classified and saved out the file.

I then, rather than doing mixture modeling, just ran MG analyses by using grouping to treat the classes as 4 groups.

Then I have a model command for the covariates







Does this seem like an appropriate way to test mean/intercept differences across classes after partialling out covariates?
 Linda K. Muthen posted on Wednesday, June 24, 2015 - 11:32 am
You are comparing the intercepts across classes. This seems reasonable. You can do this using MODEL TEST or chi-square difference testing. See pages 486-487 and 697 of the user's guide.
 Steven Lancaster posted on Monday, March 14, 2016 - 7:34 am
Hello, I ran an LCA and am bit confused by the "Latent Class Odds Ratio Results."

Here is a bit of the output:


Latent Class 1 Compared to Latent Class 2

Category > 1 47.619 16.001 2.976 0.003
Category > 1 2003.65 3586.601 0.559 0.5765

My basic question relates to whether the fourth column is a p value that can be used to determine the relative proportion of each variable between the classes. If so, the odds ratio for Y14 seems large for it to be ns. If not, how might I go about comparing proportions between the various classes in the model?
 Bengt O. Muthen posted on Monday, March 14, 2016 - 6:17 pm
The columns are as usual

Est SE Est/SE p-value

Because ORs have 1 as a neutral point you should compute

(Est - 1)/SE

and compare that approximate z-score to a z-table to get the p-value.

The Y14 OR is large but the SE is also large.
 Martin Lövdén posted on Monday, April 11, 2016 - 1:58 am

I am doing a latent profile analysis and want to test for equality of means across classes for a long list of auxiliary variables. Some of these variables are continuous and others are categorical. I understand that I should better use (BCH) for my continuous and (DCAT) for my categorical variables, but Mplus does not allow me to run these two simultnously. What is my way out here?

 Bengt O. Muthen posted on Monday, April 11, 2016 - 6:39 pm
They are done one at a time even when you have a list of them.

So you get a test of equality across classes for each of them. So not a multivariate test.
 'Alim Beveridge posted on Tuesday, June 21, 2016 - 7:17 am
Dear Bengt and Linda,

I have conducted LCA on a sample of 358. The data consist of 7 dichotomous variables and 11 continuous.

BIC indicates that 3-class solution is the best, so does entropy (0.769). AIC and aBIC are better for the 4-class model, but the LMR test shows that it's not significantly better. The LMR test also indicates that the 3-class model is not better than the 2-class solution. The BLR is always highly significant, even for the 5-class model which is worse than all other models according to all other indices.

I chose the 3-class solution, but also examined the 2-class and 4-class models. I looked at means, thresholds, probabilities, odds ratios and univariate entropies and created profile plots. The entropies are typically low and there is little evidence of separation between the classes.

I wonder if there is anything else I should look at or try. I can't examine BVR for the continuous variables, can I? Should I try adding covariances between all variables? LCFA? Exploratory FMM?

Can the poor solution be attributed to small sample size?

Thanks for any suggestions.
 Bengt O. Muthen posted on Tuesday, June 21, 2016 - 9:33 am
I'd go with BIC.
 'Alim Beveridge posted on Wednesday, June 22, 2016 - 4:31 am
Dear Bengt,
Adding to the above, I'd like to estimate the LCA with all class-specific associations or with all class-invariant associations, as you suggested in your 2015 article on residual associations in LCA with Tihomir. 3 questions:

1. is my sample size of 359 too small for that given 18 observed variables?

2. I downloaded your input file used for the 2015 article. However, I have continues variables in addition to 7 dichotomous ones. Do I add residual associations in the same way for continuous vars, using WITH?

3. Do I need to fix all parameters to the values in the original model?

thanks for your help.
 Bengt O. Muthen posted on Wednesday, June 22, 2016 - 5:13 pm
1. I think so.

2. That's problematic. You can use WITH for continuous pairs, but pairs with one continuous and one categorical need to use a factor behind the pair which quickly adds up dimensions of integration.

3. No.
 Stephanie Craig posted on Monday, August 15, 2016 - 3:54 pm
I had found this information before, but cannot seem to locate now. I am using Tech 12 and residual covariance to make the case that 4 classes are better than 3 classes, but how do I interpret a residual covariance of -4.503?

 Bengt O. Muthen posted on Monday, August 15, 2016 - 4:30 pm
Use the residual variances to turn it into a correlation.
 Stephanie Craig posted on Monday, August 15, 2016 - 5:02 pm
Sorry, I feel really dumb, but where do I get those numbers on my output and where can I find the equation to calculate it.
 Bengt O. Muthen posted on Monday, August 15, 2016 - 5:23 pm
I assume you have continuous indicators so you just divide the residual covariance by the product of the two square-rooted residual variances. You also find that in the Standardized solution.
 DavidBoyda posted on Wednesday, August 31, 2016 - 11:50 am
I have a question regarding the optimum class selection. I have examined and replicated the best log-likelihoods values for my analysis. In order to make my analysis more efficient I opted to use the OPTSEED command (which speeded up the analysis significantly). That said, my number of classes fell from a 6-class solution to a 5-class solution indicated by a lower BIC.

What is the cause/consequence of this?
 Linda K. Muthen posted on Wednesday, August 31, 2016 - 12:05 pm
When you use OPTSEED to replicate an analysis, you need to use STARTS=-0;
 DavidBoyda posted on Thursday, September 01, 2016 - 11:11 am
yes I have done this and the BIC clearly rises after I use OPTSEED.

*my number of classes fell from a 6-class solution to a 5-class solution indicated by a lower BIC. (after using OPTSEED)

any ideas?
 Linda K. Muthen posted on Thursday, September 01, 2016 - 11:41 am
Please send the output where you got the optseed and the one where you used it along with your license number to
 Courtney Peasant posted on Thursday, January 12, 2017 - 3:15 pm

I am conducting an LCA with 641 people. Everything ran normally and I exported three classes to SPSS. However, when I run simple descriptives for each class they do not match up with the descriptives in Mplus. Specifically, SPSS indicates that 0 women in my Class 2 are sex workers, while Mplus indicates that 36% of my sample in Class 2 are sex workers. Any thoughts on what could be going on?
 Linda K. Muthen posted on Thursday, January 12, 2017 - 3:48 pm
Where do you get the descriptive statistics for the classes in Mplus? What you are looking at in SPSS is descriptives using most likely class membership.
 Steven Lancaster posted on Thursday, January 12, 2017 - 3:54 pm
If I recall, MPLUS uses one classification scheme for its internal plots and then another for the actual output data file it creates. The "FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON ESTIMATED POSTERIOR PROBABILITIES" are displayed within MPLUS (sorry for the all caps, that is how they list it). However, the number of people in each class in the SPSS file (which is based on the MPLUS output data) is a representation of "FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP."

I have run into this problem before and this is the best I could come up with.
 Linda K. Muthen posted on Thursday, January 12, 2017 - 5:21 pm
You are using most likely class membership in SPSS so the numbers will match FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP. Are you asking why they don't match the ones based on estimated posterior probabilities. I am not clear on what you are asking.
 Steven Lancaster posted on Thursday, January 12, 2017 - 6:33 pm
Not really a question. Was just saying I had a previous situation in which the class membership graph I got in MPLUS did not match the output that MPLUS generated for SPSS. Upon digging in a bit, it seems MPLUS uses the posterior probabilities to determine class membership for graphs/output within the system, but when you export the data it determines the classes based on most likely latent class membership. I would have to go and double check my outputs, but am pretty sure that is what I found.
 Linda K. Muthen posted on Friday, January 13, 2017 - 5:54 am
That sounds correct. Model results are based on posterior probabilities not most likely class membership so that is what we would plot.
 Courtney Peasant posted on Friday, January 13, 2017 - 6:12 am
BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP in Mplus match the most likely class membership in SPSS.

However, the RESULTS IN PROBABILITY SCALE reads as follows for the problem variable, SEX_WORK, where category 1 = no sex work and category 2 = sex work:

Latent Class 2

Category 1 0.637
Category 2 0.363

I interpret this to mean that 36% of the women in Latent Class 2 endorse engaging in sex work. However, when I export my classes into SPSS and look at the same frequency of SEX_WORK, results indicate that 0% of the women in Latent Class 2 have engaged in sex work.

Is this because the results in probability scale are based on the posterior probabilities. If so, how do I reconcile the Mplus output with the SPSS output?
 Linda K. Muthen posted on Friday, January 13, 2017 - 4:19 pm
Please send files to show what you are looking at and your license number to
 samah Zakaria Ahmed posted on Sunday, January 22, 2017 - 4:05 pm
in case of latent class model, how can i decide that the model i proposed is acceptable?
from which part in the output?
 Bengt O. Muthen posted on Monday, January 23, 2017 - 3:36 pm
For categorical outcomes you can use TECH10.

You can also use BIC to see how many classes you need.
 samah Zakaria Ahmed posted on Sunday, January 29, 2017 - 10:50 am
Thanks for quick reply...But i have another question...If i have 6 items, how to select the best items of them to represent my latent class variable?
i.e: how to compare latent class models having different number of items?
 Bengt O. Muthen posted on Sunday, January 29, 2017 - 3:11 pm
You can look at entropy which indicates how well a latent class variable is measured.

With a 2-class model, a good item has low probability for one class and high for another.
 samah Zakaria Ahmed posted on Sunday, January 29, 2017 - 5:16 pm
The ON statement specifies the multinomial logistic regression of the categorical latent variable c on the continuous latent variable f (c on f)
my question is:
how to specify the multinomial logistic regression of the categorical latent variable c1 on another categorical latent variable c2 ? (the ON statement doesn't run)
 Linda K. Muthen posted on Monday, January 30, 2017 - 5:52 am
Please send the output and your license number to It should run.
 L Weaver posted on Thursday, June 08, 2017 - 5:53 am
Dear Linda (or Bengt):

In response to a question about changing the comparison class, you wrote this:

Yes, you can do this by adjusting the starting values so that the class you want for the reference class is the last class.

Can you elaborate on that for a novice? I don't understand how the starting values change the class ordering in the analyses. Is it as simple as putting in the results from a prior analyses, and listing it last, e.g., for a two-class model (where C#2 is the class I want as comparison):




det1 WITH sht1*0.07723 (1);

[ det1*1.80622 ];
[ sht1*0.08263 ];

det1*0.59478 (2);
sht1*0.11440 (3);


det1 WITH sh1*0.07723 (1);

[ det1*2.48608 ];
[ sht1*2.81990 ];

det1*0.59478 (2);
sht1*0.11440 (3);

It seems like this would work, but I don't trust myself (it seems too easy!!).

Thanks - in advance!
 Linda K. Muthen posted on Thursday, June 08, 2017 - 6:11 am
Yes, this is what you do. You should also use STARTS=0; You can use the SVALUES option of the OUTPUT command to obtain input that contains the ending values as starting values and change the class labels to the order you want. Don't use the values for the logits of the categorical latent variable.
 L Weaver posted on Saturday, June 10, 2017 - 9:23 am
Thanks for your help!
 Morgan DeBusk-Lane posted on Thursday, August 10, 2017 - 7:18 am
What is the difference between Model Test and Model Constraint? Which is more appropriate to in determining class mean differences?

Model Test:


Model Constraint:
12Idea = M2 - M8;
!or we can use
!0 = M2 - M8;

As I understand it, the Model Test provides a Wild Chi-Square statistics and its significance, whereas the Model Constraint provides a z-test, the difference, and its significance. The two are asymptotically equivalent.

Which is more appropriate?

Thank you
 Bengt O. Muthen posted on Thursday, August 10, 2017 - 4:36 pm
If you use the

12Idea = M2 - M8;

line, the two approaches are the same asympt'y as you say. There is no way to choose between them as far as I know.
 Caroline Abbott posted on Thursday, October 19, 2017 - 10:11 am
Drs. Muthen,

I am trying to test if intercepts and slopes significantly differ across classes. Per your advice in this thread, I am using loglikelihood values to compare nested models. However, when I set 2 class intercepts to be equal (using syntax below), the classes change (i.e. different size classes, different parameters for the 3rd class). Is there a way to do this and retain the classes in the original model?

CLASSES = c(3);

STARTS = 10 2;

i s | SIQ0@-16 SIQ1@-12 SIQ2@-8 SIQ3@-4 SIQ4@0;
i2 s2 | BDI0@-16 BDI1@-12 BDI2@-8 BDI3@-4 BDI4@0;


 Bengt O. Muthen posted on Thursday, October 19, 2017 - 3:23 pm
For mixtures, I recommend using Model Test which avoids changing classes.
 Anne Black posted on Monday, April 09, 2018 - 8:49 pm
Dear Drs. Muthen and Muthen,
I want to test differences in thresholds of a categorical outcome across latent classes, adjusting for covariates that differ significantly by class. I have complex survey data.

I used the Regression Auxiliary model approach outlined in Asparouhov & Muthen 2014.

I ran the LCA, specifying auxiliary variables and saving BCH weights. (I did not specify DCAT for the categorical auxiliary variables because I got a warning about this with type=mixture complex).

In the final step I have specified the following model:

usevar are BCHW1 BCHW2 BCHW3 Y1 X1 X2 X3;


training=BCHW1-BCHW3 (BCH);

categorical is Y1 ;

subpopulation = lpa eq 1;

MISSING are all (999);

Analysis: Type = Mixture complex;

Y1 on X1 X2 X3;

Y1 on X1 X2 X3;
[Y1$1] (a);

Y1 on X1 X2 X3;
[Y1$1] (b);

Y1 on X1 X2 X3;
[Y1$1] (c);

model test:

Can I use this approach, given my categorical outcomes?

Thank you for advising.
 Bengt O. Muthen posted on Tuesday, April 10, 2018 - 3:45 pm
Send the output for your DCAT run to Support along with your license number.
 Lawrence M. Scheier posted on Thursday, July 26, 2018 - 12:25 pm
I am not seeing any syntax for comparing latent class proportions in an LCA model with two KNOWNCLASS groups. A 3-class model fits best in one group and a 4-class model in another group. The next step would be to constrain the latent class proportions to equality across groups, I would think. There is not a vast literature that recommends "steps" for these comparisons (Vermunt has a paper on LCA and Invariance). Any thoughts on how to set up Mplus to contrast the # of classes in each knownclass?
 Bengt O. Muthen posted on Thursday, July 26, 2018 - 6:12 pm
Try deleting "c ON cg" - having that regression in the model allows for different c class probabilities.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message