Class Probabilities PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 M. Lee Van Horn posted on Monday, May 29, 2000 - 5:44 pm
I'm working on a growth mixture model with a couple covariates. The models seem to be running fine, but when I output the Class Probabilities I get probabilities ranging from 0 to 4 or so. My understanding is that they should range from 0 to 1? So they question is am I misinturpreting the output or is something wrong with my model? Thanks.
 Linda K. Muthen posted on Tuesday, May 30, 2000 - 9:44 am
I am assuming that you are talking about the output and not saving conditional probabilities in a separate file. When you have covariats, only logit values are printed not probabilities. When you have no covariates, the results are printed as both probabilities and logits. Let me know if this is not what you mean.
 Peter Tice posted on Thursday, September 14, 2000 - 8:04 am
I started with an analysis identifying the proper number of latent classes for a particular set of data and ended up with 4 latent classes with the following probability distribution:

#1 78.06%
#2 4.41%
#3 13.81%
#4 3.70%

By itself, this is not problematic until I modify the model to include covariates as predictors of latent class membership (e.g., C#1 on verb3 impuls, etc.). After running such a model with the 4 class solution my probability distribution looks rather different.

#1 68.06%
#2 12.95%
#3 14.39%
#4 4.58%

I don't recall seeing an explicit discussion of this in the manual but would like to understand why this happens and which class distribution should I rely on. At first glance this reminds me of circumstances in conventional growth curve modeling where the parameter estimates vary between unconditional (i.e., w/o predictor variables) and conditional (i.e., w/predictor variables) models.

Thank you, posted on Thursday, September 21, 2000 - 8:43 am
Your analogy with growth modeling is good. For a stable solution, the two class prob distributions should be the same, but may not be for somewhat misspecified models. For example, some covariates may have direct effects on some outcomes. Note also that the order of the classes may be different for the two solutions.
 Patrick Malone posted on Wednesday, October 10, 2001 - 9:37 am
Another question. Is it possible to work with (i.e., specify, constrain, etc.) the class probabilities in the Model statement (I may just be missing something in the manual). For example, I'd like to constrain two latent classes to be of equal size in a k>2 latent class analysis.

 Linda K. Muthen posted on Wednesday, October 10, 2001 - 11:56 am
You can constrain the class probabilities to be equal as follows in a three or greater class model. In this example, the probabilities for the first two classes are held equal.


[c#1] (1);
[c#2] (1);

You can also fix the values if you want:


[c#1@-1.308] (1);
[c#2@-1.308] (1);
 Patrick Malone posted on Thursday, October 11, 2001 - 5:28 am
 Patrick Malone posted on Thursday, October 18, 2001 - 10:04 am

I'm finding that the above isn't behaving the way I expected it to. I'm running a 12-class LCA, in which I'm using training data to constrain members of one manifest group to one set of six classes, and members of another manifest group to the other set of six classes. There are no predictors of the latent class membership involved.

The two sets of six classes are constrained to be identical -- the thresholds for each indicator in Class 1 are constrained to be equal to the thresholds in Class 7, and so forth. With Bengt's help, I was able to get that part going. The next step I want to take is test the hypothesis that the class sizes are identical across sets -- that Class 1 is the same size as Class 7, and so on.

I've used the syntax above to constrain the class size parameters to be equal across groups (1 to 7, 2 to 8, etc.). I've tried leaving Class 6 unconstrained and constraining it to zero (I want it to match Class 12 in size). The model converges, and the "means" in the "latent class regression model part" do follow the equality constraint. However, the "proportions of total sample size" do not, and the variance is not in any obvious pattern.

I'd appreciate any insights on this.
 bmuthen posted on Thursday, October 18, 2001 - 11:14 am
What is printed under the heading


is based on the estimated posterior probabilities for each individual, given the model and the individual's data. Posterior probabilities should be thought of as akin to factor scores. In most latent class models, the proportions reported here will agree perfectly with the class probabilities as obtained from the estimated [c#...] logit values, but not always - your model is an example of an exception. So I would say that you succeeded in getting the [c#...] parameters set up the way you wanted. The fact that the posterior probability results disagree may be an indication of model misfit, and how they disagree can be a suggestion for how to modify the model.
 Patrick Malone posted on Monday, November 05, 2001 - 5:13 am
I'm back to the problem with some other data, and I'm still wrestling with this, I'm afraid. I'm testing a treatment vs. control difference. I've run one model with 8 classes. By training data, the control Ss are constrained to be in odd-numbered classes and the treatment Ss in even-numbered classes. The two groups are the same size. In the first model, the indicator logits for the classes are constrained to be equal across pairs of classes (class 1 with 2, class 3 with 4, etc.); there are no constraints on the latent class probabilities. This seems to be working fine, and fits only slightly worse than a model without the constraints.

Then I want to test the hypothesis that, assuming the same class structure, the class sizes/proportions are the same in the two groups. Eyeballing the FINAL CLASS COUNTS in the first model, it looks like there's a pattern of differences. So I ran a second model, with the following code added to the %OVERALL% portion:

[lc#1] (201);
[lc#2] (201);
[lc#3] (202);
[lc#4] (202);
[lc#5] (203);
[lc#6] (203);

Running this model shows a very slightly worse -2LL, and the following results:


LC#1 -0.434 0.202 -2.141
LC#2 -0.434 0.202 -2.141
LC#3 -0.055 0.127 -0.429
LC#4 -0.055 0.127 -0.429
LC#5 -0.528 0.170 -3.097
LC#6 -0.528 0.170 -3.097
LC#7 0.000 0.000 0.000



Class 1 87.19078 0.09797
Class 2 93.93257 0.10554
Class 3 124.40489 0.13978
Class 4 140.18348 0.15751
Class 5 82.87007 0.09311
Class 6 81.98881 0.09212
Class 7 150.53426 0.16914
Class 8 128.89514 0.14483

It really doesn't look like the two resuls sections are matching up, and I'm at a loss.

Any suggestions?

Thanks again,
 Anonymous posted on Tuesday, November 06, 2001 - 8:26 am
Actually your results are matching up. The final class count is the sum of the estimated posterior probabilities for each individual. Much like the average factor score estimate could be different from the estimated factor mean those numbers are different. Of course you may question the restrictions imposed in the latent class regression model part.
 Patrick Malone posted on Tuesday, November 06, 2001 - 11:25 am
Ok, thanks. Somehow your wording helps Bengt's message from 10/18 click; now I see what's going on. So does MPlus anywhere produce what I was looking for? The raw probabilities of class membership? Or is the best way to exponentiate the logits, and work with the odds?

 Anonymous posted on Tuesday, November 06, 2001 - 6:42 pm
For this particular model you can just average the final class counts, that is the raw probabilities of class membership for class 1 and class 2 is (0.09797+0.10554)/2.
 Anonymous posted on Tuesday, November 06, 2001 - 7:40 pm
For this particular model you can just average the final class counts, that is the raw probabilities of class membership for class 1 and class 2 is (0.09797+0.10554)/2.
 Anonymous posted on Wednesday, February 09, 2005 - 3:03 am
Dear Linda,
I am a new user of Mplus. I am using your short courses package as well as the user Guide. I have started the course on Modelling with categorical latent variable and I very much appreciate it if you would expand a little on the statement on page 13, "The u-c relation is a logit rgeression".
with thanks
 bmuthen posted on Wednesday, February 09, 2005 - 10:49 am
Look up computations with logistic regression in the Version 3 Mplus User's Guide, chapter 13. This explains the logit in (81) on the page you are referring to. U is the binary dependent variable and c is a categorical x variable. U has a threshold which is the negative of an intercept parameter. If c has 2 classes, then that means that we have a dummy x variable which in line with linear regression means that there are 2 intercepts in the model.
 Girish Mallapragada posted on Sunday, March 13, 2005 - 1:16 pm

I have a two-class solution with covariates and the logistic output looks like this.

C#1 ON
Y19 420.059 269.306 1.560
Y20 0.515 0.611 0.843
Y21 40.788 16.549 2.465
Y22 -3.643 1.368 -2.662

C#1 86.006 54.002 1.593

Is class 1 the base class ? i.e., for the coefficient of y20 which is 0.515, does the log odds =1.674 imply that a for change of 1 unit in variable y20, the odds that the indicudual belongs to base class (class 1) increase by a factor of 1.674?

am i interpreting this correctly?
 Linda K. Muthen posted on Monday, March 14, 2005 - 6:06 am
The last class is the reference class. See Chapter 13 of the Mplus User's Guide. In the section, Calculating Probabilities From Logistic Regression Coefficients, you will find a full interpretation.
 Girish Mallapragada posted on Monday, March 14, 2005 - 7:02 am
Hello Dr.Muthen,

Thanks a lot.

 Anonymous posted on Friday, June 17, 2005 - 9:21 am
Problem 1: I have a dichotomous X variable (that underlies a continuous distribution) that I want to use as a predictor in a growth mixture model. I can't find an example in the manual on defining this X as categorical, is it unnecessary to do so (if it underlies a continuous distribution)?

Problem 2: On a different project. I have a somewhat unexpected effect of a cognitive measure call X on LGM model slope of achievement (y1 thru y4). The model has 4 time points and the last two are freely estimated. The estimated time scores are 1.5 and 1.8. I am troubling to interpret the effect of X which I would have expected to be positively related to the slope (It is positively related to the intercept). This may be a rather interesting finding, that the higher a respondent scores on cognitive skill X the higher their intercept will be, but the higher they score on X the lower their growth rate.

I plotted the slope estimates by the X variable (by the way I LOVE that we can do this in MPLUS!!) and everything looks fine (yet negative). It seems to me that it’s not their growth rate 'overall' its their initial growth rate (0, 1) because the estimated times scores are 1.5 and 1.8 suggesting a smaller rate of growth at times 3 and 4, that is the first period of growth 0, 1 there is greater incline for those with a lower score on cognitive skill X, but as time goes on (1.5 and 1.8) that initial effect is smaller. Is this correct that the effect of X (when y’s are coded 0, 1, *2 *3 rather than 0 1 2 3) is on the initial growth rate (0 1), we assume it’s linear and stable (as it would be if modeled 0 1 2 3), but when you estimate the time scores you can’t conclude that; or am I unnecessarily getting thrown off by the estimated time scores?

Say the effect of X is -.30 then to get the slope:

Factor loading * effect
0 (-.3) =0
1 (-.3) =-.3
2 (-.3) =-.6
3 (-.3) =-.9
but with estimated time scores
0 (-.3) = 0
1 (-.3) =- .3
1.5 (-.3) = -.45
1.8 (-.3) = -.54

So I would conclude a higher score on X was related to a slower growth rate, but that this slower growth was strongest initially during the first two waves of data.

Thanks for your help!!
 BMuthen posted on Sunday, June 19, 2005 - 5:37 am
1. The scale of observed x variables is not an issue in estimation. They should not be placed on the CATEGORICAL list. In regression, covariates are treated as either dummy variables or contiuous variables.

2. It is not unusual for a covariate to have a positive influence with the intercept growth factor and a negative influence with a slope growth factor. You may want to use time scores 0 * * 1 if you are interested in growth between the first and last time points.
 Anonymous posted on Wednesday, August 03, 2005 - 10:56 am
I am relatively new to mixture modeling and am not sure that I understand the rationale behind constaining class probabilities to be equal. There does not seem to be a clear explanation of the circumstances underlying when this should or should not be done in the manual or on the discussion board. I have a few questions on this issue:

1)Is this primarily a theoretical consideration (i.e., I have no a priori beliefs as to class sizes so I will assume them to be relatively close to equal)? Or is it instead based upon clues derived from an analysis of results from runs in which class probabilities were not constrained?

2)By using this feature, am I somehow forcing data to fit in groups or am I simply making it easier for the model to converge, similar to assigning starting values for class thresholds?

3) What is the relationship (if any) between the logit starting values I assign for class thresholds and constraining class probabilities to be equal?

Thanks in advance - as a novice, I know I'm probably blurring lines on several of these issues but appreciate your thoughtful response.

BTW - I am working with all continuous variables if that makes any difference.
 bmuthen posted on Monday, August 08, 2005 - 1:29 pm
Typically you would not constrain the class probabilities to be equal. What is the reason that you think they should be or are held equal?
 Marco Grados posted on Thursday, December 01, 2005 - 9:45 am
I have run LCA on 3 sets of data N = 197, N = 203, and N = 400 for psychiatric comorbities on a sample of patients (5 comorbidities). I obtain 3 classes as the best solution in all three cases and they make sense clincally. I was interested in the probabilities of belonging to each class for subjects and used the command:


I have used this before in another analysis and have obtained probs as required. However, now all I am obtaining for the variables cprob1, cprob2 and cprob3 are 0's and 1's...

Any ideas appreciated

Marco Grados, MD, MPH
 Linda K. Muthen posted on Thursday, December 01, 2005 - 12:54 pm
Please send your input, data, output, and license number to and we can take a look at it.
 Sylvana Robbers posted on Wednesday, March 05, 2008 - 5:38 am
Dear Dr. Muthen,

I am working on GMM and I try to decide on the best number of classes. When I choose the model with the best BIC, the class probabilities vary between .65 and .85.

My questions are: Does a class probability of .65 still indicate good fit? Is there a minimum required value for the average latent class probabilities?
Is there a rule of thump with regard to the average latent class probabilities to decide on model fit and the number of classes?

Or should I just trust the best BIC and not base my conclusion on the class probabilities?

Do you know of any reference that discusses this topic?

Thanks in advance.

Thanks in advance for your answer.
 Linda K. Muthen posted on Wednesday, March 05, 2008 - 8:59 am
See the following paper which can be downloaded from the website to see how to determine the number of classes:

Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications.

Probabilities are not related to model fit.
 Fiona Shand posted on Monday, June 15, 2009 - 9:14 pm
I have been comparing 2, 3 and 4 latent class models to the same with 1 factor. The best fit appears to be the 3 class 1 factor model. In the final class counts based on the estimated model and estimated posterior probabilities, class 2 has 3.7% of the sample. However in the final classification class 2 has no individuals! Distribution based on most likely class membership is:

C#1 0.272
C#2 0.000
C#3 0.728

I notice from the discussion above that it's possible to constrain 2 classes to be the same size where there's 3 or more classes. I wonder if that is feasible for my analysis? Or is there something else going wrong with my analysis? Fiona
 Linda K. Muthen posted on Tuesday, June 16, 2009 - 8:19 am
This means that the individuals in Class 2 have most likely class membership in another class which points to no need for Class 2.
 Sylvana Robbers posted on Tuesday, November 17, 2009 - 2:09 am
Dear dr. Muthen,

I am running a 3-class multigroup LCGA with the following class probabilities.

group 1: 0.28 0.63 0.09
group 2: 0.25 0.68 0.07

A chi-square crosstabulation (by hand) indicates that the probabilities differ between the groups. Is there a way in Mplus to test directly whether the 0.63 differs from the 0.68? And is there a way to get SE's or CI's for these probabilities?

Thanks in advance.
 Linda K. Muthen posted on Tuesday, November 17, 2009 - 9:21 am
You can use MODEL CONSTRAINT to define the probabilities. You will then obtain standard errors for them. You can use MODEL TEST to test if they are equal. See the user's guide for further information.
 Sylvana Robbers posted on Wednesday, November 18, 2009 - 4:57 am
Unfortunately I need a little more help from you. I tried several statements, but nothing works, and also it seems I really need to make constraints with MODEL CONSTRAINT, but I only want SE's.

What should I specify differently?:

Model constraint:
[cg#1.c#1]; (do I need to add (1)?)

(By the way, I was referring to the probabilities under this heading: LATENT TRANSITION PROBABILITIES BASED ON THE ESTIMATED MODEL)

Thanks again.
 Linda K. Muthen posted on Wednesday, November 18, 2009 - 9:31 am
See Slides 48-50 of the Topic 6 course handout. This shows how MODEL CONSTRAINT can be used to define the latent transition probabilities described toward the end of Chapter 13. Standard errors are estimated for new parameters defined in MODEL CONSTRAINT.
 Bonamy Oliver posted on Monday, November 30, 2009 - 6:02 am
Hi there,

I am using Growth mixture models to estimate developmental trajectories from continuous data at 6 time points.

I am using the following syntax to save the results and conditional probs:

File is gmmcp.dat;
format is free;

However, I need the cprobs (only) to more than three decimal places. Please can you help?

Many thanks
 Linda K. Muthen posted on Monday, November 30, 2009 - 9:43 am
This is not possible in the current version of Mplus.
 Martijn Hogerbrugge posted on Thursday, April 01, 2010 - 10:38 pm

I am currently conducting LCA with 5 binary variables and 3 categorical variables (with 3 categories each). When looking at the results, quite some item response probabilities are set to 0.0/1.0, because the "logit tresholds were approached and set at extreme values". Is there a possibility to turn this default setting off? If not, is this also normally done in LatentGold? (I ask this for comparative reasons; comparing my results with that from a previous study).
Also, I was wondering whether you can tell me when one should use a logit, and when a probit link when conducting LCA. The v5 manual only mentions the option with regard to LTA. Which one should you use when you employ a LCA?
Finally, in the recent book on LCA and LTA by Collins and Lanza (Wiley series) it is explicitly stated that the likelihood ratio statistic (G^2) shouldn't be used when the df of the model is high because the reference distribution for the G^2 statistic is not known. Is it still okay to use the statistics reported in TECH11, as they are based on the likelihood ratio statistic?

Thanks in advance,
 Linda K. Muthen posted on Friday, April 02, 2010 - 8:21 am
I don't know what Latent Gold does but fixing them is the correct thing to do.

It is most common to use logit but probit is also possible.

I would need to see the exact verbage from the book to interpret what they are saying. You can send it to
 Bengt O. Muthen posted on Sunday, April 04, 2010 - 5:19 pm
Yes, the likelihood-ratio chi-2 (G^2) does not approximate chi-2 well with the sparesness in the observed-data frequency table that you get with many variables. And you cannot use chi-2 difference testing to find the appropriate number of latent classes; that also is not chi-2 distributed. Instead BIC has been found to work well. And also TECH11 and TECH14. See for example the article

Nylund, K.L., Asparouhov, T., & Muthen, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.

which is on our web site under Papers, Latent Class Analysis. TECH11 and TECH14 are akin to the bootstrap approaches mentioned in the book you referred to, although not focusing on the frequency table chi-2, but directly on the likelihood-ratio statistic, considering its non-chi-2 distribution.
 anonymous posted on Monday, April 05, 2010 - 12:02 pm
I am trying to determine how best to characterize derived classes. I noticed that in the "results in probability scale" section of Mplus output, there are significance values associated with the probabilities of each latent class indicator. Are these p-values indicating whether the probability is greater than 0 or testing another hypothesis? In addition, in one case, the probability is significant even though the probability value is 0. In other cases, when the probability value is 1.00, the p-value is also 1.00.
thanks in advance!
 Linda K. Muthen posted on Tuesday, April 06, 2010 - 8:15 am
The test is against zero. I would need to see the full output and your license number at to comment on your other statements. They sound odd.
 Eric Turkheimer posted on Wednesday, April 28, 2010 - 3:00 pm
I presume this is a very basic question about Latent Class Analysis. I am running a two group LCA (trying to discriminate MZ and DZ twins from a bunch of ratings of their similarity). I have good reason to think that the solution should be unbalanced, with about 90% of the cases in one group and 10% in the other. But no matter what I do I seem to end up with close to 50-50. Is there a way to force Mplus to find an unbalanced solution. Is it a matter of fixing the mean of the latent categorical variable on a logit scale? Can one do it with starting values? Or fixing the loadings on a few select items? Thanks.
 Linda K. Muthen posted on Thursday, April 29, 2010 - 8:11 am
Please send the output and your license number to
 Peter Lugtig posted on Monday, June 21, 2010 - 3:01 am
Dear Bengt/Linda,
I am working on a GMM/LCGa type model which includes about 10.000 cases. I first identified the correct number of classes, and now wish to regress c on some covariates. However, I'd also like to treat the most likely classes from the first stage as observed variables in the second stage, just to check my model solutions.
I found that the maximum recordlength is set at 5.000 in mplus 6, so I have some problems saving the class probabilities. Can you recommend any work-around procedure?
thanks heaps,
 Linda K. Muthen posted on Monday, June 21, 2010 - 7:05 am
I'm not sure I totally understand the question. The data for each observation can be on more than one record.
 Peter Lugtig posted on Tuesday, June 22, 2010 - 1:31 am
Sorry, I should be clearer.
I have a fairly large dataset (10.000) ans wish to save class probabilities. There seems to be a recordlength maximum of 5.000 however. Is there a way to save data for more records than 5.000?
 Linda K. Muthen posted on Tuesday, June 22, 2010 - 6:17 am
Record length and the number of records are not the same thing.I don't know of any such maximum. Please send the files and your license number to support so I can see what you mean.
 Beate St Pourcain posted on Thursday, September 30, 2010 - 5:32 am
I am working on a LCA with longitudinal binary data and estimate thresholds for each class. However, when I add an intercept to the overall statement of the model (which is actually a growth parameter), the model fit is considerably improved.

How can this be, and does this affect the interpretation of the thresholds? e.g. do I need to standardise them now as they will be affected by a latent continuous variable (intercept)?

Thank you very much for your time.
 Linda K. Muthen posted on Thursday, September 30, 2010 - 12:19 pm
It sounds like you are freeing the intercept of the intercept growth factor. This should be fixed at zero as part of the growth model parameterization. It should not be freed. If this is not what you are saying, please send your output and license number to
 Arina Gertseva posted on Sunday, October 24, 2010 - 10:04 pm
Dear Dr. Muthen,
I am using a mixture modeling approach to crime rates at the county level. The best fitting model was the one with four latent classes. When I ran the frequencies on the class variable saved in the data file and compared those with the class counts reported in the output, I noticed some differences:

Class 1: 46 (in output) 57 (data file)
Class 2: 2561 (in output) 2558 (data file)
Class 3: 206 (in output) 205 (in data file)
Class 4: 13 (in output) 6 (in data file)

Shouldn’t they be the same? Thank you.
 Linda K. Muthen posted on Monday, October 25, 2010 - 11:56 am
You are most likely comparing most likely class membership with class counts from the estimated model. These are the same only when entropy is one.
 Alexandre Morin posted on Tuesday, February 01, 2011 - 9:40 am
I am estimating LPA models from plausible values (20 sets) from a previous ESEM model (i.e. I want to estimate the profiles based on the "factor scores", but tried PVS for greater precision).
Doing so, I get warnings that I cant save class memberships and most importantly class probabilities from the models ? I guess that is because we work from 20 different data sets.
I am wondering whether there would be a way to save these information (i.e. to combine/merge the class probability results into a single data file) ?
 Linda K. Muthen posted on Tuesday, February 01, 2011 - 2:07 pm
The only way to get most likely class membership is to run each data set separately. You would then need to decide how to use the sets of most likely class memberships.
 Jan Ivanouw posted on Wednesday, April 20, 2011 - 5:56 am
I would very much like help in locating the formula for calculating CPROBABILITIES for the individuals in a mixed model.

What I mean is each individual's probability for belonging to class 1, class 2 etc. when knowing the person's item scores and the Mplus estimated model parameters for a certain mixed model.

Mplus gives these probabilities in a save file when demanding CPROBABILITIES, but I would like to be able to calculate them myself based on the parameters estimated by a Mplus model (like thresholds and alpha).

I have looked into the Technical Appendix 8, but it seems that the formula/algorithm to use is not there.
 Bengt O. Muthen posted on Wednesday, April 20, 2011 - 10:40 am
It is formula (161) of the Tech App 8. When you use that in the last step when the iterations have converged you get the CPROBs. It's a bit involved.

See also

Muthén, B. & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55, 463-469.

and the more technical Section 6.4 of

Muthén, B. & Asparouhov, T. (2009). Growth mixture modeling: Analysis with non-Gaussian random effects. In Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 143-165. Boca Raton: Chapman & Hall/CRC Press.
 Jan Ivanouw posted on Wednesday, April 20, 2011 - 11:12 pm
Thank you very much.
 Jan Ivanouw posted on Thursday, April 28, 2011 - 10:41 am
Working with the calculation of cprobabilities I still have problems.

In Appendix 8 equation 161 I need P(u|c,x) and I find this from eq 152/153.

My problem is in eq 152 the term -(tau - u*). tau is a value which is estimated in the Mplus model, while as I understand it u* is a dimension. How can I find the value for u* to plug into the eq?
 Bengt O. Muthen posted on Sunday, May 01, 2011 - 11:22 am
u* comes from eqns (150) and (151), so it is also a function of model parameters - and covariate values x_i. You should view u* as the logit behind each observed u, so the parameters of those two eqns are found in whatever relationships in your model that influence the u's.
 Jan Ivanouw posted on Thursday, May 05, 2011 - 8:25 am
Thank you for the explanation.

I have an additional question concerning calculation of posterior probabilities:

I tried the method of calculating posterior probabilities which is described in topic 5 slides 69 and 70 (Berlin july 2009 version).
I have a simple model with one latent class variable (with two classes), and with 10 binary indicators.

I estimated the model in Mplus. Then I plugged the probabilities from this model into the equations in slides 69 and 70 for some selected persons. The result was that I got the same posterior probabilities which was calculated by Mplus.

My question is:
In slide 70 it is indicated that I have to use the EM algoritm in order to get the posterior probability for a person, treating the class membership as missing data.
Why was this in fact not necessary when using the parameters from the estimated Mplus model?
 Bengt O. Muthen posted on Thursday, May 05, 2011 - 7:04 pm
The probabilities you plugged in (using slide 71) were from the Mplus output and Mplus computes those from the estimated model parameters - and those are obtained via ML estimation using the EM algorithm.
 Martin Ratzmann posted on Tuesday, July 05, 2011 - 12:35 pm
when calculate LCA or LPA, is it possible to restrict the ratio/ number of cases for class-solution?

Thank You!
 Bengt O. Muthen posted on Tuesday, July 05, 2011 - 1:32 pm
You can try inequality constraints on the class probabilities, or rather its logit parameters.
 Martin Ratzmann posted on Wednesday, July 06, 2011 - 12:24 pm
Thank You! And where can I find the syntax-code for those procedures?
 Bengt O. Muthen posted on Wednesday, July 06, 2011 - 12:50 pm
See the Version 6 UG, pages 617-618 for the general approach and end of chapter 14 for how the logits relate to the probabilities for the multinomial regression.
 Beth Bynum posted on Friday, August 26, 2011 - 7:44 am
Good Morning,

I’ve run a LPA analysis with one sample and I have determined the number of classes to retain. Using the classes that were determined in the LPA analysis, I would like to determine the most probable classes for a new sample of individuals. For example, I would like to give the same items that were used as class indicators to a new sample and using the results of the LPA to predict which class each person would most likely to be in. Ideally, for each person in the new sample, I would like to be able to use the item responses and compute the probability of being in each class, without having to run a new LPA. Is this possible? If so, does MPLUS output provide the necessary information to compute the probabilities? What equation should I use?

 Bengt O. Muthen posted on Saturday, August 27, 2011 - 12:08 pm
Yes, this can be done and is a good use of LPA. For the new sample of individuals you simply fix all the model parameters at the solution you got from your first sample. This can easily be done using the SVALUES option in the OUTPUT command. The second run then does not estimate any parameters but only estimates the posterior probabilities for each class for each subject (see the CPROBS option of the SAVEDATA command). The results also include an indication of the most likely class. In fact, you can do this for only a single subject.
 Beth Bynum posted on Tuesday, September 06, 2011 - 12:26 pm
Thanks for the response. Is there a way to estimate the probabilities without running MPLUS? For example, we would like to give each individual in the new sample the set of items, then, in an on demand environment provide feedback to the individual about the class they would most likely fall into based on their pattern of responses on the items. We would like to be able to set this up in an online or computer environment using a standard programming language such as visual basic or javascript.

I tracked down and deconstructed the posterior probability equation that LPA uses to estimate the probabilities. Since I know the latent class means, latent class covariance matrices, and the response vector, I could use the equation to produce new probabilities for each new individual. The formula relies on complex matrix algebra to estimate the density functions, which isn’t a show stopper, but we were hoping to find something that might be simpler.

My Questions: Have you ever seen anyone use LPA/LCA in this type of application? Do you know of any other way to compute the probabilities that might be simpler than relying on the posterior probability formula? We were hoping that LPA would be similar to discriminant analysis and produce a set of equations that predict class membership for a single person. Finally, I wanted to check which density function MPLUS uses to compute the posterior probabilities. Does it use the multivariate normal density function?
 Bengt O. Muthen posted on Tuesday, September 06, 2011 - 5:45 pm
I think this is a not uncommon type of application and I know others who have had similar interests in having their own routine. I don't think one can avoid going via the computations of the posterior probabilities (like we show in Appendix 8 in the V2 Tech App.). Discriminant analysis assumes known groups, whereas with the posterior probabilities of mixtures a subject is a fractional member of several groups. Yes, Mplus uses the multivariate normal density when the outcomes are continuous.
 Tamika Gilreath posted on Monday, October 24, 2011 - 2:20 pm
I have a question regarding interpretation of conditional probabilities for particular classes. If the conditional probability for a particular response category in a particular latent class is say 0.75 can one say that 75% of respondents in this class selected this category or is it only appropriate to say that respondents classified here had a 75% chance of selecting this category. I am of the opinion that in this case these two are interchangeable since the probabilities also represent proportions. Is this the case?
 Bengt O. Muthen posted on Monday, October 24, 2011 - 6:33 pm
I think it is somewhat better to say:

"respondents classified here had a 75% chance of selecting this category". This is the probability that is being estimated. Even better is:

"respondents in this class had an estimated probability of 0.75 of selecting this category."

- That avoids the ambigious phrase "classified here". Note that we are talking about a model, that is about subjects' class membership (a subject is a member of only one class in the model), not how they were classified after the parameter estimation was done (when they are fractionally members of several classes).

The other statement sounds like you are talking about the subjects who are most likely in this class and among them 75% are in this response category - that may not be true.
 Carolyn CL posted on Friday, March 01, 2013 - 10:19 am
I am running a LCA with 3 continuous level variables and 2 dummy variables (reference dummy omitted: D_Me). I am not clear how to interpret the conditional probabilities for responses on the dummy variables.

1. Why does MPLUS assign 1 and 0 to some of the conditional probabilities - how can I interpret this?

ex: Latent Class 1

Category 1 0.145 0.307 0.472 0.637
Category 2 0.855 0.307 2.788 0.005
Category 1 1.000 0.000 0.000 1.000
Category 2 0.000 0.000 0.000 1.000

2. Should I ignore the 'Results in probability scale' and calculate actual probabilities for each dummy variable using thresholds (Chpt 13)?

3. Is there an easier way to calculate the probabilities for the reference dummy variable?
 Linda K. Muthen posted on Friday, March 01, 2013 - 6:07 pm
I would treat the 3-category variable as a nominal variable by putting it on the NOMINAL list.

1. The thresholds have extreme values.
2. No, the results in probability scale will be the same as any values you calculate by hand.
3. Treat is as nominal.
 Carolyn CL posted on Thursday, March 07, 2013 - 9:51 am
Thank you very much.

I am now treating the 3-category variable as categorical (it is ordered).

To be clear:

1. The model terminates normally, but I am getting this message:

Should I be concerned?

2. The "Results in probability scale" can indeed be interpreted as the probability each class falls into a specific category of the ordinal level variable?
 Linda K. Muthen posted on Friday, March 08, 2013 - 9:25 am
1. Yes.
2. Yes.
 Jerry Grenard posted on Thursday, July 11, 2013 - 6:18 pm
We would like to calculate the percentage of correct class assignments in a simulation study of latent profile analysis (no covariates). Is it possible to save the individual posterior probabilities for each class and the most likely class membership(save=cprob)in a Monte Carlo simulation with multiple replications to compare to the Monte Carlo data sets that provide true class memberships? Or, is there another way to obtain this value?
 Linda K. Muthen posted on Friday, July 12, 2013 - 4:03 pm
No, to do this, you will need to save the data sets from Monte Carlo and use some sort of batch facility like the RUNALL utility or possibly MplusAutomation with R in an external Monte Carlo.
 Mirjam van Zuiden posted on Thursday, November 14, 2013 - 7:13 pm
Dear Linda and Bengt,

While performing a 4 class latent profile analysis, the SAVEDATA = CPROB; option only results in a datafile in which the values of the variables + id variable are included. So, no posterior and class probabilities and also no assigned class number. Could you tell me what I'm doing wrong?

Thank you.


TITLE: LPA 4 classes
DATA: FILE IS LPAZscores2.csv;
VARIABLE: NAMES ARE id Dep_T2 Reexp_T2 Avoid_T2 Emo_T2 Hyper_T2;
usevar = Dep_T2 Reexp_T2 Avoid_T2 Emo_T2 Hyper_T2;
CLASSES = c (4);

STARTS 500 125;

savedata: file is "classmembership2"

 Linda K. Muthen posted on Friday, November 15, 2013 - 6:18 am
Please send your output and license number to
 Kathleen posted on Wednesday, February 05, 2014 - 1:21 pm
I'm using LCA and LTA and my questions here concern which data to present in tables and graphically. Should I output the results into a data file, requesting cprobabilities, and then compute the means of each item for each class using each person's most likely class membership? Or, should I graph the class probabilities of each item in the "results in probability scale" on the output file?

The means obtained from averaging most likely class are different from the results (in the probability scale) on the output, as they are different things, but I'm not sure which is better to present.

Can you also clarify the difference in the most likely class membership and the probabilities?
 Bengt O. Muthen posted on Wednesday, February 05, 2014 - 2:43 pm
I would use the Mplus Plot command to plot the means/probabilities for the items in each class.

I would not use most likely class unless necessary and it isn't here.

Each person gets an estimated probability (cprob) to be in each class. So with 2 classes the person may get the cprob values 0.85 and 0.15. The most likely class membership for that person is then class 1. But of course that is a cruder piece of information than using both 0.85 and 0.15. A second person with cprobs 0.90, 0.10 is a bit more clearly a class 1 member.
 Marketa Krenek posted on Sunday, February 23, 2014 - 10:26 am
Hi there,

I'm trying to get a new dataset with the class membership that also includes the id variable. It is not working. What am I doing wrong? Here is my code:

DATA: FILE is C:\Users\dissfullbl.dat;

VARIABLE: NAMES ARE ID pda0102 pda0304 pda0506 pda0708 pda0910 pda1112;

MISSING = ALL (-999);
USEVAR = pda0102 pda0304 pda0506 pda0708 pda0910 pda1112;

Classes = c (2);


FILE IS C:\Users\lcaPDA.dat;
SAVE = cprobabilities;

Thank you!
 Linda K. Muthen posted on Sunday, February 23, 2014 - 12:20 pm
Please send the output and you license number to
 Alvin  posted on Monday, May 05, 2014 - 5:09 pm
Hi Dr Muthen, I just ran a 3-class factor mixture model using Bayesian estimator, which took shorter time to reach convergence than using algorithm integration. The output looks interesting, I couldn't however figure out the response pattern in each class as no item-response probabilities were given. I wonder if I need to use an auxiliary variable and export class probabilities into a different file and analyze class characteristics that way. Is there a way to obtain item-response probabilities in the output?
 Bengt O. Muthen posted on Monday, May 05, 2014 - 6:46 pm
No automatic way. You would have to be able to express the probabilities using Model constraint.

Also, if you do Bayesian mixtures you have to make sure that you know about "label switching".
 Alvin  posted on Wednesday, May 07, 2014 - 5:56 pm
Thanks Dr Muthen. I've read your work on Bayesian and have come across "label switching", and it seems that there isn't a solution yet to this, is there? Another interesting observation is, and I read it somewhere that, model modifications do not really have much impact on the overall fit of the model assessed using PPC? why and how do you then work out potential areas of misfit? also, can you request DIC in mixture models? If not, are there alternative indices that allow for model comparison?
 Bengt O. Muthen posted on Thursday, May 08, 2014 - 7:44 am
We discuss the label switching in our Topic 9 Bayes course on the website. You can apply constraints. Because of this and the current lack of measures for model comparisons, I would not recommend Bayes for mixture modeling unless you are an expert Bayes analyst.

I don't have the PPP experience you mention. That is, model improvements can be seen in the range given in the output below the PPP value.
 Allison Tracy posted on Thursday, May 15, 2014 - 6:58 pm
I am currently on a project in which I need to calculate the probabilities of belonging to a set of latent classes, given individual responses to the latent class indicators. I can do this when the latent class indicators are categorical but I can't find an equation that will calculate these probabilities when the indicators are truly continuous. Could you point me to such an equation?
 Linda K. Muthen posted on Friday, May 16, 2014 - 9:56 am
The general principle is the same using the Bayes Theorem. I don't know of where it is written out.
 Katharine Ann Buck posted on Tuesday, December 02, 2014 - 9:28 am
I saved class probabilities and then opened the saved probabilities in WordPad. How do I determine which column represents which class? Any suggestions to help interpret this issue would be much appreciated.
 Linda K. Muthen posted on Tuesday, December 02, 2014 - 9:54 am
Asked and answered on Mplus support.
 Shawn Bauldry posted on Thursday, March 19, 2015 - 9:30 am
I'm trying to save a dataset with the most likely class membership. I'm using Mplus v 7.2 (Mac). When I run the following code, it saves a dataset with the observed variables, the weight variable, and the id variable -- but no variable for the most likely class. Do you see what I'm doing wrong?

TITLE: Unconditional LCA, Full Sample, 4 class;
DATA: FILE = "pta mplus data 3.txt";
VARIABLE: NAMES = aid gswgt4_2 edu18 edu20
edu22 edu24 edu26 edu28 edu30 job18 job20
job22 job24 job26 job28 job30 rel18 rel20 rel22
rel24 rel26 rel28 rel30 par18 par20 par22 par24
par26 par28 par30;
USEVARIABLES = edu18-par30;
CATEGORICAL = edu18-par30;
CLASSES = c(4);
WEIGHT = gswgt4_2;
STARTS = 20 10;
SAVEDATA: FILE = "pta 3 unc 4 cprob.txt"

The output file lists the following for saved variables.

Save file
pta 3 unc 4 cprob.txt

Order and format of variables

EDU18 F10.3
PAR30 F10.3
GSWGT4_2 F10.3
AID F10.3

Save file format
 Bengt O. Muthen posted on Thursday, March 19, 2015 - 2:57 pm
There is no semicolon after the FILE option in the SAVEDATA command – so Mplus can’t see the SAVE=CPROB request.
 ZHANG Liang posted on Wednesday, September 23, 2015 - 7:47 am

I ran a 4-class LCGA, and requested the CPROBABILITIES. The variables in my input file are ID, A1-A6.

Column in the cprob.dat are arranged like this: [origin values of A1-A6] [ID] [estimated values of A1-A6] [??] [??] [Posterior Probabilities for each class] [the class in which this case is finally assigned to].

But I'm not sure about the meanings of the two columns denoted as [??]. Could you please explain it?

Thank you!
 Bengt O. Muthen posted on Wednesday, September 23, 2015 - 11:14 am
Look at the bottom of the output where the variables saved are listed.
 Laura Serra Saurina posted on Tuesday, March 15, 2016 - 9:01 am
I have a doubt regarding the computation of the posterior probabilities for another individual using the information from Mplus. I am considering a LCGA and, more specifically, I am considering an unconditional model. In particular, I have 30 repeated measures of about 3000 individuals. Using Mplus I have obtained 3 classes defined by an intercept and the slope. I have seen the formula 145 in appendix 8 but I am not sure about its application. Which are the values I have to introduce as x_i? Are they the 30 repeated values we have for each individual? But how does the formula work?
Thanks in advance.
 Bengt O. Muthen posted on Tuesday, March 15, 2016 - 5:40 pm
I am looking at the technical appendix for Version 2 on our website and the posteriors are given in equation (161). You don't have x's but only y's which are the outcomes at the different time points.
 Laura Serra Saurina posted on Wednesday, March 16, 2016 - 2:54 am
Thanks Dr Muthen. However, looking at this equation, I also do not know how to apply it. I understand that only have y's, which are the outcomes at the different time points but then, how x's and u's works? And, how can I adapt equation (161) to my data?

Thanks again.
 Bengt O. Muthen posted on Wednesday, March 16, 2016 - 6:14 pm
You ignore the u term and drop u and x from the other parts of the formula. See also Muthen & Shedden (1989) on our website.

If this seems unfamiliar and too technical, you may need to contact a statistical consultant to help you.
 Annie Robitaille posted on Tuesday, October 18, 2016 - 9:19 am
I am running a Growth Mixture model. The number of observation included in the analysis is 570 since some are missing information on x-variables or missing on time scores. However, when I request the data file for the cprobabilities and Fscore, I actually get information for 630 individuals (the entire sample). Why is that information included in the data file even though they are not included in the analysis?
 Bengt O. Muthen posted on Tuesday, October 18, 2016 - 6:11 pm
Please send the output and the saved file to Support along with your license number.
 davide morselli posted on Thursday, December 15, 2016 - 4:30 am
I am running a latent transition analysis following Nylund-Gibson et al. 2014. In my LTA model I have measurement invariance by class across time points (i.e., classes at t1 have the same structure than classes at t2), and I would need to get the Logits for the classification probabilities of the most likely class for each time point. To do so I was using the starting values (class latent means, and indicators by class) of the LTA to run 2 separate LCAs (one per time point), but I get a completely messed up class membership compared to the LTA.

What is the best way to do so?

Is there a way to get logits of the classification probabilities directly form the LTA?
 davide morselli posted on Thursday, December 15, 2016 - 4:33 am
this are the svalues from my LTA:

[ c1#1*-0.10978 ];
[ c1#2*1.09612 ];


[ ar_1*-1.91941 ] (1);
[ auth_1*0.36262 ] (2);
[ polcyn_1*0.29949 ] (3);
[ mor_1*0.31598 ] (4);


[ ar_1*-0.56575 ] (13);
[ auth_1*0.41641 ] (14);
[ polcyn_1*0.13853 ] (15);
[ mor_1*0.41398 ] (16);


[ ar_2*-1.91941 ] (1);
[ auth_2*0.36262 ] (2);
[ polcyn_2*0.29949 ] (3);
[ mor_2*0.31598 ] (4);

[ dropout$1@-15 ];


[ ar_2*-0.56575 ] (13);
[ auth_2*0.41641 ] (14);
[ polcyn_2*0.13853 ] (15);
[ mor_2*0.41398 ] (16);
 davide morselli posted on Thursday, December 15, 2016 - 4:35 am
and this my LCA model based on the LTA

usevariables = nAR_1 AUTH_1 POLCYN_1 MOR_1 ;
CLASSES = C1(9);


[ c1#1@-0.10978 ];
[ c1#2@1.09612 ];

ar_1@0.16055 (5);
auth_1@0.19758 (6);
polcyn_1@0.52895 (7);
mor_1@0.23867 (8);


[ ar_1@-1.91941 ] (1);
[ auth_1@0.36262 ] (2);
[ polcyn_1@0.29949 ] (3);
[ mor_1@0.31598 ] (4);


[ ar_1@-0.56575 ] (13);
[ auth_1@0.41641 ] (14);
[ polcyn_1@0.13853 ] (15);
[ mor_1@0.41398 ] (16);
 Bengt O. Muthen posted on Thursday, December 15, 2016 - 8:08 am
We request that postings be limited to one window. Questions that require long input/output segments should be sent to Support along with license number.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message