Latent class models PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 Anonymous posted on Friday, April 30, 2004 - 7:41 am
I have been using the Mplus Demo and hoping to use Mplus software in the near future. Would someone from the expert programmers help me to list the class membership for each subject. I.e., I would like a variable that stands for class membership in the same order used for the outcome variables, oredered by id for example. Many thanks.
 Linda K. Muthen posted on Friday, April 30, 2004 - 9:20 am
I'm not sure I totally understand your question. You can save class membership by using SAVE=CPROBABILITIES in the SAVEDATA command and you can save the id variable by using the IDVARIABLE option of the VARIABLE command. The you can sort the data in Excel.
 Anonymous posted on Wednesday, June 16, 2004 - 9:19 am
I have an average of 30 nurses nested in 200 hospitals. Thus i have about 10 nurse-, and 5 hospital-level measures. I want to use LCA to find typologies of hospitals. In addition I would like to have nurse-level variables as predictors of class membership. Can I use LCA with TWO LEVEL models models? Are there any examples available? Can I do this analysis in one step? Should i do the hospital LCA separate and then combine it with a TWO LEVEL Model?
 bmuthen posted on Wednesday, June 16, 2004 - 9:54 am
Twolevel LCA is described in the version 3 User's Guide in chapter 10. One question is if the LCA class membership is (1) a hospital-level variable or (2) a nurse-level variable. You mention finding typologies of hospitals which suggests (1), but then mention prediction of class membership by nurse-level variables, which suggests (2). The Mplus multilevel LCA is primarily aimed at (2), i.e. finding nurse-level latent classes while accounting for the multilevel data by allowing LCA coefficients that vary across hospitals, e.g. latent class intercepts in the regression of class membership on nurse variables. If you are really interested in (1), perhaps you want to aggregate nurse variables to the hospital level and do a regular LCA on the 200 hospitals. In any case, that would seem a natural starting point.
 jbond posted on Tuesday, October 19, 2004 - 10:03 am

After reading Nagin and Tremblay's Psych methods paper "Analyzing development trajectories of distinct but related behaviors: a group-based approach," I was wondering if you could outline, or if there is a paper discussing, the differences between the likelihood based approach to LCGA that he uses and the method that used in Mplus. I know Mplus is quite bit more flexible in its ability to incorporate more complex dependencies between variables and/or factors and also allows for things like variation around the growth trajectory but I believe that the technical specification of the model is slightly different. Unfortunately, he doesn't give much of the details of what is being done. Interesting that his prediction, on page 31, that "we suspect that adapting this modeling framework to a structural equation modeling framework would make it difficult to retain two key strengths of the framework - the fleibility to handle a variety of different data types and to accommodate missing data" turned out to be somewhat off the mark. Thanks for any input,

 bmuthen posted on Tuesday, October 19, 2004 - 10:07 am
Will take a look at that paper and get back to you.
 bmuthen posted on Wednesday, October 20, 2004 - 5:48 pm
Looks to me like the Nagin-Tremblay 2001 Psych Methods paper corresponds to a parallel process, LCGA where you have one latent class variable for each of the two LCGA growth processes and you are interested in relating the two latent class variables to each other (so a "multiple c" application - see the end of Chapter 13 of the Version 3 User's Guide). So this can be done in Mplus and Mplus also allows for various generalizations that give important flexibility.
 Anonymous posted on Thursday, December 30, 2004 - 7:16 am
I am running a LCA with 21 binary class predictors and 4 covariates (direct & indirect) in Mplus 3. I need to save the most likely class membership to use in another model. I did so using the (partial) input below:

STARTS = 100 10;

FILE IS c:\rnp_saved_data_4C_dir&indi

This resulted in:


Class Counts and Proportions

1 12568 0.22747
2 6053 0.10955
3 9818 0.17770
4 26812 0.48528

However when I look at the saved data file, of these 55,251 people, only 9,079 of them actually have a class assignment -distributed as follows:

Saved Latent
1 2282
2 919
3 1657
4 4221

1. Why weren't all 55,251 people in the original data set assigned to classes in the new data file? How can I get the saved file to give everyone a class assignment?

2. I also noticed that the data in some of the original variables changed> For example, some binary variables were no longer binary, unique IDs changed and were no longer unique so I can not merge with other data files to continue my analyses, etc. How can I ensure that the data for all USEVARIABLES are copied as is when using the SAVEDATA command?

 Linda K. Muthen posted on Thursday, December 30, 2004 - 8:11 am
Can you send your full output to It sounds like your data are not being read correctly. If you could send a portion of the data -- first ten cases -- that would also help. Note that Mplus has a maximum record length of 1000.
 Anonymous posted on Friday, March 04, 2005 - 3:32 am

I have a rather general question concerning latent class growth analysis. Is this procedure suitable for a multivariate approach? By this I mean determining class membership on two or more variables (that are theoretically closely interlinked) simultaneously?

Many thanks in advance!
 bmuthen posted on Friday, March 04, 2005 - 6:19 am
Yes, certainly. You can either do this using two separate latent class variables that are related, or use a single latent class variable that is determined by your multiple processes jointly - sounds like the second approach fits your theory best.
 koen luyckx posted on Friday, March 04, 2005 - 7:16 am
Manu thanks for your quick reply, Dr Muthen!
 Anonymous posted on Tuesday, March 15, 2005 - 12:37 pm
Do you have any suggestions for working around the recordlength=1000 limitation? I have slightly more than that (around 1200-1300) in my data and am wondering how I can run a latent profile analysis and get back latent class probabilities and assignments for all of my data. This seems like a surprising limitation in this day and age of big data sets--or am I missing or misunderstanding something?
 Linda K. Muthen posted on Tuesday, March 15, 2005 - 2:20 pm
Each observation can have more than one record. So I suggest saving the data such that you have two records per observation. It's a long story that probably wouldn't mean much to you why we have this limit. So far no one has found in insurmountable.
 Katharina posted on Wednesday, May 25, 2005 - 9:43 am
I have only recently started using Mplus, so my question is very basic.
I have carried out LCA with covariates and would now like to use these classes (i.e. the categorical variable underlying these classes) as an independent variable in a general SEM framework to predict certain outcomes, i.e. a number of continuous latent variables. Is this possible and if so, how? also, are there any references that cover this area in greater depth?
Many thanks
 bmuthen posted on Wednesday, May 25, 2005 - 10:03 am
This is possible to do in single step in Mplus. For examples, and further references, see the Muthen (2004) chapter in the Kaplan handbook - this is available on our web site.
 Annie Desrosiers posted on Wednesday, October 04, 2006 - 10:15 am
Hi, I have 2 questions:

Is it possible to treat individually varying time of observation in a mixture model.
I try the AT statement but it does't work.

And, what means c#1 in your exemple, I don't understand...

Thank you
 Bengt O. Muthen posted on Wednesday, October 04, 2006 - 10:28 am
AT should work here - send your input, output, data, and license number to

I am not sure which example you refer to, but the User's Guide has many mixture examples that may be helpful. As for learning mixtures, you might also consider the upcoming course in Montreal.
 Annie Desrosiers posted on Wednesday, October 04, 2006 - 10:47 am
YES!!! For sure i'm gonna be at your course in Montreal.

For my first question, can I write :
type = mixture random missing;

Thank you
 Linda K. Muthen posted on Wednesday, October 04, 2006 - 10:49 am
Yes. You can see the combinations of options that can be used together in Chapter 15 where the ESTIMATOR option is discussed.
 Annie Desrosiers posted on Wednesday, October 04, 2006 - 10:59 am
Thank you for your time!!

Since I change for the next input, I can not see the plot anynore and there is latent classes that have the same intercept and slope in the output...

variable: names are id age1 age2 age3 onset y1 y2 y3;
usevariables are age1-age3 y1-y3;
tscores = age1-age3;
classes = c(6);
missing = . ;

analysis: type = mixture random missing;
starts = 20 2;

model: %overall%
i s | y1-y3 at age1-age3;

plot: type = plot3;

before :

variable: names are preco y1-y3;
usevariables = y1-y3;
classes = c(4);
missing = . ;

analysis: type = mixture missing;
starts = 20 2;

model: %overall%
i s | y1@0 y2@1 y3@2;

plot: type = plot3;
series = y1-y3(*);
 Bengt O. Muthen posted on Wednesday, October 04, 2006 - 11:05 am
I don't see you in the list of course registrants.
 Annie Desrosiers posted on Wednesday, October 04, 2006 - 11:12 am
Yes it's Annie Lemieux from University Sherbrooke!
Desrosiers is my husband name, because in Quebec we can't have the husband name but when it's not formel I use Desrosiers.
I'm sorry for the confusion, I did't realize!
 Orla Mc Bride posted on Wednesday, March 28, 2007 - 4:11 am
In my data file, I have seven categorical variables. However, I want to conduct a latent variable mixture model using data from individuals who have endorsed only one or two of any of the seven categorical variables. Is this possible?

Thank you for your time.
 Linda K. Muthen posted on Wednesday, March 28, 2007 - 9:58 am
Are you asking if you can select individuals who have endorsed only or or two items?
 Orla Mc Bride posted on Wednesday, March 28, 2007 - 11:22 am
Hi Linda,

I know how to select individuals who have endorsed only one or two variables. If I specifically know prior to conducting the analysis that an indiviudal has information on one or two variables only can a latent variable mixture model be conducted. I am concerned about model identification.

I hope this question is clear.
 Linda K. Muthen posted on Thursday, March 29, 2007 - 8:42 am
If you mean that you have observations who have missing on all but one or two items, this should be fine. Identification would be compromised only when no one provides information for a pair of variables. Note that standard errors may be larger where less information is available.
 Orla Mc Bride posted on Thursday, March 29, 2007 - 10:13 am
Individuals who have information on one or two variables (i.e., have answered the question ;'yes') have answered 'no' to the other variables. In other words, I have some individuals who have answered 'yes' to any one variable and 'no' to the other six variables. I also have individuals who have answered 'yes' to any two variables and 'no' to the other five variables.

Does this mean that identification could be compromised? I hope this is question clear.

thanks for your help.
 Bengt O. Muthen posted on Thursday, March 29, 2007 - 9:01 pm
If for each pair of variables you have some people who have observations on both variables, you are most likely ok from an identification point of view. Try it - Mplus will tell you if your model is not identified.
 chinthaka kuruwita posted on Friday, February 29, 2008 - 7:20 am
Hi Dr. Muthen,

Is there a way to see if the latent class
variable had better predictive utility of a distal outcome than any indicator by itself ?

 Bengt O. Muthen posted on Friday, February 29, 2008 - 8:19 am
Say that all your variables are dichotomous: the latent class variable, the indicators and the distal. The latent class model, including the distal, gives you an estimate of the distal probability for each of the two latent classes. Alternatively, you don't include the distal in the latent class analysis but put it as

auxiliary = distal(e);

which gives you the distal mean (proportion) for each class.

As a comparison you can then instead do regular logistic regression of the distal on an indicator and get an estimate of the distal probability for each of the two categories of the indicator.
 chinthaka kuruwita posted on Friday, February 29, 2008 - 8:49 am
Thanks Dr. Muthen,

I ran the model with the distal as an auxiliary variable and it gave me 3 estmated probabilities for the distal outcome for each class (3 class model).

I have P[attempt=1|class=1]=.2

From the regular logistic regression I got
P[attempt=1|Depression=1]=.15 for one of the indicators.

How do I compare these 2 probabilities and claim that the latent class variable has a more predictive ability than the individual indicator (or vice-versa) ?

Or does it make sense to comapre these 2 probabilities ?

I hope this question makes sense.

 Bengt O. Muthen posted on Friday, February 29, 2008 - 9:11 am
You need to also compute the other probabilities:

P[attempt=1|class=2] = a

as well as

P[attempt=1|depression=0] = b.

Then you compute

a - 0.2


b - .15

and see which of those 2 differences is the biggest.

You may also want to include the distal in your LCA.
 chinthaka kuruwita posted on Friday, February 29, 2008 - 11:22 am
Thanks Dr. Muthen,

I do not understand how to interpret the results of the method you suggested. For example if I got

a - 0.2 =.1
b - .15 = .02

Does this suggest that the latent classes are more preditive than the indicator (depression) of the distal outcome 'attempt' ?
 Bengt O. Muthen posted on Friday, February 29, 2008 - 11:36 am
Well, the latent class approach says that the distal probability changes .1 when changing latent class, whereas the logistic approach based on one indicator says that the distal changes only .02 when the indicator changes categories. So that says that the latent classes are more influential on the distal than this indicator is. You can also quantify this as an odds ratio. How well the two approaches predict is another matter where you have to compare estimated probabilities with actual distal outcomes in line with what you do in regular logistic regression (see such books).
 chinthaka kuruwita posted on Monday, March 03, 2008 - 10:19 am
Thank you Dr. Muthen. This helps a lot.
 chinthaka kuruwita posted on Wednesday, March 05, 2008 - 11:03 am
Hi Dr. Muthen,

When I compute the differences in probabilities as you suggested, I found one indicator giving a higher change in probability than the classes. Rest of indicators gave smaller change in probability than the classes. Does this mean I'm better off using that one indicator to predict my distal outcome than using latent classes ?
 Bengt O. Muthen posted on Wednesday, March 05, 2008 - 11:16 am
Maybe. You also have to look at how well you predict - see logistic regression text books.
 chinthaka kuruwita posted on Wednesday, April 16, 2008 - 9:03 am
Hi Dr. Muthen,

Is there a way to see if the latent class structure vary by other variables say gender using a single model rather than running seperate LCA for males and females.

 Linda K. Muthen posted on Wednesday, April 16, 2008 - 10:49 am
You can run a model using the KNOWNCLASS option using males and females as the known classes. But I would analyze them separately to see if the same LCA model at least as far as number of classes holds for them both before I put them together.
 chinthaka kuruwita posted on Wednesday, April 16, 2008 - 11:14 am
Thank you Dr. Muthen.
 chinthaka kuruwita posted on Tuesday, April 22, 2008 - 9:27 am
Hi Dr. Muthen,

Can I estimate the transition probability matrix for males and females in a 3 class LTA model with 3 time points ? I tried the following code:

C2#1 on C1#1 GENDER;
C2#1 on C1#1 GENDER;

But I only get two matrices for the 2 time points. How can we get seperate transition probability matrices for males and females ?

 Linda K. Muthen posted on Tuesday, April 22, 2008 - 12:34 pm
You have to create these yourself. They are not given as part of the Mplus output.
 chinthaka kuruwita posted on Tuesday, April 22, 2008 - 12:37 pm
How can I do it ? Do I have to run two LTA's for males and females ? Is there any example in the Mplus manual ?

 Bengt O. Muthen posted on Tuesday, April 22, 2008 - 12:59 pm
You analyze males and females together using gender as a "Knownclass" latent class variable. Then you look at the output showing the estimated probabilities for all latent classes - from this table you can compute the transition matrices for males and females.
 chinthaka kuruwita posted on Wednesday, April 23, 2008 - 7:02 am
 chinthaka kuruwita posted on Wednesday, April 23, 2008 - 12:44 pm
Hi Dr. Muthen,

I tried the following code to estimate the transition probabilities. This is part of the code:

CLASSES : CG(2) C1(3) C2(3) C3(3);


C2#1 ON C1#1 CG#1;
C2#2 ON C1#1 CG#1;
C2#1 ON C1#2 CG#1;
C2#2 ON C1#2 CG#1;
here I'm trying to estimate the t1 to t2 transition probabilties for each gender. But I'm not getting a meaningful output. What is wrong in this code ? please help.

 Linda K. Muthen posted on Thursday, April 24, 2008 - 7:52 am
Please send your input, data, output, and license number to
 chinthaka kuruwita posted on Wednesday, September 17, 2008 - 12:43 pm
Dear Dr. Muthen,

I have a distal outcome and I used the auxiliary = distal(e); option to get class varying means(proportions) for my binary distal outcome in a 3 class LCA model.

I have p1 for class 1, p2 for class2 and so on.

If I need to compare p1 ad p2 (i.e pairwise contrasts) how can get the standard errors of these estimated means ?

 Linda K. Muthen posted on Wednesday, September 17, 2008 - 3:27 pm
In Version 5.1, you will get the standard errors.
 chinthaka kuruwita posted on Thursday, September 18, 2008 - 7:23 am
I have ver 5.0 . How do I upgrade to ver 5.1 ?

 Linda K. Muthen posted on Thursday, September 18, 2008 - 8:01 am
If you have a current upgrade and support contract, you can go to Support - Mplus Updates to download Version 5.1.
 chinthaka kuruwita posted on Thursday, September 18, 2008 - 11:29 am
Thanks Dr. Muthen,

I have one more question. How can I get a G square goodness of fit statistic for a LCA model with 3 classes in Mplus ?

 Bengt O. Muthen posted on Thursday, September 18, 2008 - 1:35 pm
If you mean a frequency table chi-square test, yes, we give both the Pearson and the Likelihood ratio chi2.
 finnigan posted on Tuesday, February 10, 2009 - 2:53 pm

I am undertaking a longitudinal research and want to examine typologies of significant events at work, significant events outside of work and significant events in personal life.

The survey questions asks if a respondent has experienced a significant life event the answer is yes or no. if yes, please describe it as positive,negative or neither. The person is asked to then indicate if the event made them focus more on past present or future.

The data structure will be multilevel ie persons within organisations.

I am trying to find out if I can create typologies of say Positive future based work life event, past negative based personal life event etc

Would latent class analysis be appropriate to do this?

 Linda K. Muthen posted on Wednesday, February 11, 2009 - 10:07 am
It is not clear to me how your variables can be used as latent class indicators. Do you plan on combining them in some way?
 Kätlin Peets posted on Wednesday, April 08, 2009 - 8:33 am

I have a question concerning the sample size. I have conducted a latent class analysis with 34 subjects (thus, I have a very small sample). How does the sample size influence the estimates/fit indices?

Thank you
 Bengt O. Muthen posted on Wednesday, April 08, 2009 - 3:07 pm
Hard to say in general and much of this depends on how large the class separation is. The standard errors might not be that good. You need a Monte Carlo study to get a better feel for this. The BIC tends to underestimate the number of classes at low sample sizes as shown in the Nylund et al article.
 Julie Mount posted on Monday, June 15, 2009 - 8:21 am

I have recently submitted a paper where I've used latent class analysis (with binary indicators) to investigate subtypes of cognitive impairment. The results suggested a four-class solution; two of these classes represented global cognitive impairment across domains (a mild and a severe class) while the other two represented impairment in specific domains.

A reviewer commented that the global findings differed from a number of previous factor analyses of cognitive data and asked for more explanation.

I used the same dataset and ran an exploratory factor analysis on the same binary indicator variables and this suggested a three factor solution. The first two factors looked like my domain-specific deficit classes but there was no global type of factor. So the observation of the global classes doesn't seem to be related to anomalies in my dataset but rather with differences between the two methods.

My question then is whether there is any theoretical reason that factor analysis might be less likely to yield global factors (ie high or low scores across all observed indicators) than latent class analysis? I haven't been able to find anything in the literature yet.

 Bengt O. Muthen posted on Monday, June 15, 2009 - 9:02 am
Exploratory factor analysis is not well suited for finding a global (general) factor. You would have to do a CFA to capture that - a "general-factor, specific-factor" model (see handout for Mplus short course topic 1). But you should also investigate if EFA/CFA or LCA fit the data better by comparing their BIC values. One model may very well be a lot better than the others in terms of BIC and should therefore be chosen on statistical grounds. For an example, also using factor mixture analysis, see
 Julie Mount posted on Thursday, June 18, 2009 - 3:23 pm
Thank you, this is really useful. For the LCA (four class) the BIC is 1316 and for the EFA (3 factors) it's 8232. A big difference!
 mpduser1 posted on Tuesday, June 21, 2011 - 10:40 am
I'm generating a series of models in Mplus in an attempt to determine if latent class membership differentiates classes on some dichotomous distal outcome, Y, after adjusting for a series of covariates, X.

I want to make sure I've set the problem up correctly (and am interpreting the results correctly) within the Mplus framework.

My latent class model (L) has three classes, predicted based on a series of indicators (I).

What I've done is first estimated:

L --> I
L --> Y

In doing so I found that my classes differ on Y. In other words, the probability of endorsing the dichotomous item Y varies by latent classes.

Then, I want to see if the cross-class variation in probability of endorsing Y changes after using covariates X. So, I estimate:

L --> I
L --> Y
X --> Y

In many instances, the results appear to make sense. But for one particular Y, I obtain latent class probabilities of endorsing Y as 0 (or, close to it) for all of my latent classes.

I'm wondering if this makes sense or if I've set the model up wrong -- essentially none of my respondents, regardless of latent class, are endorsing Y after I adjust for X?
 Linda K. Muthen posted on Wednesday, June 22, 2011 - 9:33 am
Please send the output for the last analysis and your license number to
 Leslie Roos posted on Tuesday, March 13, 2012 - 1:40 pm

I am a student new to MPlus and LCA and trying to determine which module would be best for my analyses:

M-Plus Base Program or M-Plus Base program with Combination Add on.

The project I am working on is with a large population database and will investigate whether 'classes' derived from a range of childhood variables (some multi-level varying in frequency and severity) will predict a binary adult outcome.

Any advice would be much appreciated. I'm not sure I fully understand what '2-level LCA' would look like.
 Linda K. Muthen posted on Tuesday, March 13, 2012 - 2:57 pm
If you want to do Latent Class analysis, you would need either the Base plus the Mixture Add-On or the Base plus the Combination Add-On.

If you don't have multilevel clustered data, you do not need to worry about multilevel analysis.
 Malcolm Cunningham posted on Thursday, March 15, 2012 - 1:32 pm
I am struggling with model interpretation. I have approximately 104,000 respondents (37 items) and 99,000 different response patterns so my response space is almost a uniform distribution. It is my understanding that MPLUS classifies on the basis of response pattern so this distribution presumably affects interpretation.

Can you please tell me how LCA decides class membership? And, does the response pattern composition affect class membership?
 Linda K. Muthen posted on Friday, March 16, 2012 - 9:45 am
The response patterns are a summary of the categorical data. The do not affect class membership. Classes are formed by finding the model with the lowest correlations among the latent class indicators for each class. The model assumes conditional independence. The interpretation of the classes is based on the profiles of the latent class indicators.
 Tim Stump posted on Saturday, September 22, 2012 - 8:32 pm
I'm running LCA with bayes estimator. I don't seem to get Bayes Information Criteria (BIC) to compare models with different number of classes as with ML or MLR estimator. I've just started to learn this type of model. Maybe I'm missing something easy about how to obtain BIC. Key parts of input program are below:

ANALYSIS: TYPE = mixture;
svdrng1-svdrng11 with svdrng1-svdrng11(p1-p55);

svdrng1-svdrng11 with svdrng1-svdrng11(q1-q55);
OUTPUT: TECH1 tech10 tech11 tech14;
 Linda K. Muthen posted on Sunday, September 23, 2012 - 6:02 pm
This is not yet available.
 Tim Stump posted on Monday, September 24, 2012 - 7:55 am
If BIC is not available, are there other suggestions/guidelines on determining number of classes when using bayes estimator for LCA?
 Tim Stump posted on Monday, September 24, 2012 - 8:54 am
For LCA with bayes estimator, is deviance information criterion (DIC) available? Is there an option I can specify to get this displayed in the output.
 Tim Stump posted on Monday, September 24, 2012 - 9:25 am
Sorry, I'm showing my ignorance in this topic. But, looking back at slides from a previous workshop, I see that posterior predictive p-value is useful for determining model fit with bayes estimator along with posterior predictive checking histogram - difference in chi-square (difference between real/replicated data). Would this be a useful method to determine number of classes from an LCA model?
 Tihomir Asparouhov posted on Monday, September 24, 2012 - 12:12 pm
Take a look at section 6.5.3 about using PPP as a class enumeration technique.

BIC and DIC are not available yet.
 Tim Stump posted on Friday, September 28, 2012 - 7:20 am
Thank you for referencing that article, very helpful.
 Ed Dunne posted on Monday, November 12, 2012 - 5:41 pm
I have a question about within-LCA class comparisons. To your knowledge, has anyone every compared participants within a specific class? For example, if I have four classes, can I take only ONE of those classes and make comparisons based on gender (i.e., males in this group were more likely to do X when compared to females in this group)?

Thank you!
 Linda K. Muthen posted on Monday, November 12, 2012 - 7:49 pm
You can use gender as a KNOWNCLASS variable. This is like multiple group analysis. See Example 7.21.
 Marieke van Geel posted on Wednesday, October 22, 2014 - 4:45 am
Dear dr's Muthen,

I have a question similar to the third of this thread, but since it's ten years later now, the views and possibilities with regard to this question might have changed.

My dataset contains responses of approx 1750 teachers who work in approx 100 schools. The questions I askes the teachers are about the school culture and the school leader. A latent class analysis seems the best approach, but how to take the nested structure into account? I would like to assign schools to latent classes of school culture and assign schools to latent classes of educational leadership.
In the next step, I would like to examine relationships between the two.

Would it be possible to conduct these analyses in MPlus, and if so: how?

Thank you in advance,
 IYH Boon posted on Tuesday, November 11, 2014 - 1:14 pm
I was wondering if you know of a paper that lays out the basic estimating equations for (longitudinal) latent class analysis? Something akin to what you lay out in Eq (18)-(20) in Muthen 2001 ("Latent Variable Mixutre Modeling), but for LCA with repeated measures.

Thanks in advance for any help you're able to offer.

 Bengt O. Muthen posted on Wednesday, November 12, 2014 - 5:41 pm
Our UG refers to the 1998 Reboussin et al article in MBR - that gives a good statistical account of it.
 Katrien Vangrieken posted on Friday, April 08, 2016 - 1:50 am
I have a relatively small dataset of 28 teams (108 individuals). I want to make classes with LCA, based on a team’s growth pattern (of a team-level variable).
When I perform the LCA, Mplus delivers classes. The entropy, class-sizes and latent class probabilities are all good.
However, looking at the standardized model results, the estimates, S.E. etc. are all 999.000.

Do you have any suggestions or possible explanations?
 Linda K. Muthen posted on Friday, April 08, 2016 - 11:30 am
Please send the output and your license number to
 Mara Soncin posted on Monday, May 09, 2016 - 11:01 pm
Hi Dr. Muthen,
I am a new Mplus user. I am running this model with all binary indicators:
Names are
codice_scuola_8_14_15 GESTIONE d10a_d [...] d11k_d;
Missing are all (9999) ;
IDVARIABLE = codice_scuola_8_14_15;
Usevariables are d10a_d - d11k_d;
Categorical are d10a_d - d11k_d;
classes = c (3);
Type = mixture ;
STARTS = 2500 25;
file is lca_mgm.txt ;
save is cprob;
format is free;

But I cannot run SAMPSTAT and STANDARDIZED outputs, getting this message:
*** WARNING in OUTPUT command
STANDARDIZED (STD, STDY, STDYX) options for TYPE=MIXTURE with categorical, count,
censored or nominal outcomes are available only with ALGORITHM=INTEGRATION.
Request for STANDARDIZED (STD, STDY, STDYX) is ignored.
*** WARNING in OUTPUT command
SAMPSTAT option is not available when all outcomes are censored, ordered
categorical, unordered categorical (nominal), count or continuous-time
survival variables. Request for SAMPSTAT is ignored.

How can I modify the code to have these outputs?
Thank you.
 Bengt O. Muthen posted on Friday, May 13, 2016 - 2:02 pm
This is not offered in the Mixture case. Standardization is not needed here. For sampstat you can ask for Crosstabs or run a Basic run not declaring the variables as categorical.
 Georgios Sideridis posted on Monday, February 06, 2017 - 4:27 am
I am running a 2-time LTA model with 5 indicators and 5 classes. When I save the data using the SAVEDATA command I nicely get the raw scores of the indicators and the id variable followed by 25 cprob columns, a c1 and a c2 columns and an MLCJOINT column. Three questions please:

1. what is the meaning of the ordered cprob1-cprob25 columns?
c2#1 ON c1#1
c2#1 ON c1#2
they go up to 20 not 25! What am I missing?

2. Are the C1 and C2 columns the class assignment using posterior probabilities for Time 1 and Time 2, respectively?

3. What does the MLCJOINT column refer to?

Thank you,
 Bengt O. Muthen posted on Monday, February 06, 2017 - 3:47 pm
We need to see your output and the Savedata file - please send to Support along with your license number.
 samah Zakaria Ahmed posted on Monday, February 13, 2017 - 3:18 am
In case of having two latent class variables, how can i know the correct number of classes for each of them?
cause i find tech 11 and tech 14 are suitable only for 1 variable.
Is the way that i first treat with each of the two latent class variables separately before the overall analysis?
 Bengt O. Muthen posted on Monday, February 13, 2017 - 5:52 pm
Q1: Go with BIC.

Q2: Ok
 Georgios Sideridis posted on Thursday, March 23, 2017 - 2:12 pm
Dear Dr. Muthen,
I hope you are well. I run a 5-class LTA model with two time points. I posited invariant solutions at both time points, freely estimated probabilities, I used SVALUES and I saved cprob values (C1 and C2) using modal assignment. My question relates to the fact that the transition probabilities I obtained from the output (right below) deviate markedly from the probabilities obtained when I cross-tabulate the C1 and C2 probabilities obtained when I save membership. I did not employ the 3-step approach. Can you please advice, I am obviously doing something wrong.
thank you,


C1 Classes (Rows) by C2 Classes (Columns)

1 2 3 4 5

1 0.770 0.146 0.085 0.000 0.000
2 0.000 0.308 0.300 0.129 0.263
3 0.000 0.000 0.630 0.000 0.370
4 0.000 0.000 0.000 1.000 0.000
5 0.000 0.000 0.000 0.000 1.000

RESULTS FROM CPROB, after I save membership

1 2 3 4 5

1 0.974 0.000 0.026 0.000 0.000
2 0.000 0.455 0.309 0.000 0.236
3 0.000 0.000 0.800 0.000 0.200
4 0.000 0.000 0.000 1.000 0.000
5 0.000 0.000 0.000 0.000 1.000
 Bengt O. Muthen posted on Thursday, March 30, 2017 - 9:28 am
This can happen when the entropy is not high enough.
 Georgios Sideridis posted on Thursday, March 30, 2017 - 9:41 am
Thank you, this is very useful. Any idea on what I should be using as transition probabilities? the one reported by the model or the cprob saved values?


similarly, if I add covariates I would be getting such discrepancies so again, which estimates are most proper to use?

thank you
 Bengt O. Muthen posted on Saturday, April 01, 2017 - 5:06 pm
Always use the model-estimated results.
 Georgios Sideridis posted on Sunday, April 02, 2017 - 12:35 am
Thank you,
 Ali posted on Wednesday, August 02, 2017 - 9:29 am
Hello, I have four items, each of which has three categories.I tested the measurement invariance for two groups, and each category of an item is constraint equally. Right now, I want to test whether the class size is equally distributed between two groups. Is there any way to constrain the class size equally?

Here is my code to constrain the response probability of an item equally between groups.
c on G;
[ST53Q01#1 ST53Q01#2 ST53Q02#1 ST53Q02#2 ST53Q03#1 ST53Q03#2 ST53Q04#1 ST53Q04#2](p1-p8);

[ST53Q01#1 ST53Q01#2 ST53Q02#1 ST53Q02#2 ST53Q03#1 ST53Q03#2 ST53Q04#1 ST53Q04#2](p9-p16);
[ST53Q01#1 ST53Q01#2 ST53Q02#1 ST53Q02#2 ST53Q03#1 ST53Q03#2 ST53Q04#1 ST53Q04#2](p17-p24);

[ST53Q01#1 ST53Q01#2 ST53Q02#1 ST53Q02#2 ST53Q03#1 ST53Q03#2 ST53Q04#1 ST53Q04#2](p1-p8);
[ST53Q01#1 ST53Q01#2 ST53Q02#1 ST53Q02#2 ST53Q03#1 ST53Q03#2 ST53Q04#1 ST53Q04#2](p9-p16);
[ST53Q01#1 ST53Q01#2 ST53Q02#1 ST53Q02#2 ST53Q03#1 ST53Q03#2 ST53Q04#1 ST53Q04#2](p17-p24);
 Bengt O. Muthen posted on Thursday, August 03, 2017 - 2:39 pm
If you don't say

c on cg;

you have equality of class sizes.
 Virginia Rangel posted on Thursday, May 31, 2018 - 1:29 pm
I am running a three-step LCA and trying to get the means for the covariates in the model. I included sampstat in my output, but got a message back saying that this option is not available with a set of outcomes that seems to include most possible types of variables. For this model, my outcome is continuous, but I will be estimating other models that have categorical variables as well. How do I request the sample statistics so I can obtain the means for the covariates included in the model?
 Tihomir Asparouhov posted on Thursday, May 31, 2018 - 3:53 pm
You should be able to find the means in the section "UNIVARIATE SAMPLE STATISTICS". You can also see output:tech7 which reports the sample statistics for each class. Alternatively change the covariates to dependent variables (by mentioning the means in the model) and the values will be in the model results. If this still doesn't work send your example to
 Seth Frndak posted on Wednesday, June 20, 2018 - 7:20 am
I have a question about LCA/LPA. In some disciplines/analytic techniques, it is common to split the sample into a training and a validation sample. Would this application be appropriate for LCA/LPA? Why is this technique not typically suggested for LCA/LPA?
 Bengt O. Muthen posted on Wednesday, June 20, 2018 - 11:57 am
Q1: yes.

Q2: Don't know.
 Ashley Hum posted on Thursday, July 18, 2019 - 11:55 am
I am running a LCA with binary indicators. I have identified in an unconditional model that a 5 class model is the best fit; some classes are small. Now I am trying to explore predictors of class membership using R3STEP.

In the output, some predictors are significant when presented as logits but not in the output section that describes the predictor using odds ratios. For the odds ratio section, some of the values for certain odds ratios are large and also have large standard errors, so they are not significant. Which estimates and p values should be interpreted?
 Bengt O. Muthen posted on Thursday, July 18, 2019 - 1:07 pm
Yes, logit and odds ratio estimates are on different scales with different sampling distributions so they would have different significance results. Also, logits can work with symmetric CI, p-value style results but odds ratios should use non-symmetric CIs.

See also our FAQs on odds ratios at
 Ashley Hum posted on Thursday, July 25, 2019 - 1:51 pm
Thank you Dr. Muthen.

To clarify:
1. when using the command CINTERVAL for R3STEP output in a LCA are the CIs given for the odds ratios symmetric?

2. If so, is there a command to get non-symmetric CIs? I can't use CINTERVAL (bootstrap) because my estimator is MLR.

3. if there is not a command to get non-symmetric CI's for the ORs, is this the correct formula to compute them?: logOR ± 1.96*SE(logOR). Then
exponentiate those two limits to get the OR limits.

Thank you.
 Bengt O. Muthen posted on Friday, July 26, 2019 - 6:40 am
When CIs are given for ORs, they are non-symmetric which is what you want. Our OR FAQs explain the procedures.

If this doesn't help, send your full output to Support along with your license number.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message