Selecting the number of classes PreviousNext
Mplus Discussion > Latent Variable Mixture Modeling >
 Daniel posted on Tuesday, August 24, 2004 - 12:08 pm
I read your recent chapter in the Kaplan text on latent variable modeling, and have a question about a paper I'm revising. I have a continuous outcome that is not normally distributed. I modeled it with freely estimated class variances, and found three trajectories. However, the variables are not normally distributed. I gather that I cannot use the LMR LRT for testing the number of trajectories. The BIC supports four classes, but the change in BIC from three to four classes is minimal, and the fourth class is very superficial. Can I use the BIC criteria, since I cannot use the LMR LRT to select 3 classes?
 bmuthen posted on Tuesday, August 24, 2004 - 12:23 pm
Note that the assumption of within-class normality does not imply that the mixture is normal but can be very non-normal. Your observed variables correspond to the mixture. So these mixture models handle non-normal observed variables. Therefore, you can use the LMR. The performance of the LMR - and the BIC - is, however, not sufficiently well-known in most cases. I would not decide on number of classes based only on statistical measures but also interpretability. Given what you say about the superficial 4th class, it sounds like the way to go is 3 classes.
 Anonymous posted on Wednesday, August 17, 2005 - 2:51 pm
I am currently working on a model of behavior change about which there is some controversy in the literature. Specifically, there is some debate regarding the nature of the latent "readiness to change" construct. Some of the debate about the measurement model for the readiness to change construct essentially centers on whether individuals occupy discrete/discontinuous states of readiness or whether the construct is better modeled as continuous. My question is whether evaluation using mixture modeling might be able to shed some light on this, on statistical grounds. That is would a finding that one class best explained the data argue against the notion of different discrete classes and suggest a continuous latent construct? Or is the distinction between continuous and categorical latent variables largely a matter of heuristic concern? Vermunt, for instance, notes that the distinction between latent classes and latent traits is largely a matter of the number of points across which one intergrates. So I suppose the question is does mixture modeling provide statistical evidence regarding the categorical/continuous nature of a latent construct?
 bmuthen posted on Thursday, August 18, 2005 - 9:47 am
That is a good question that we still know too little about. I was less hopeful about this earlier. But I am looking at this currently for categorical items, contrasting latent class modeling with factor analysis modeling and with hybrids of the two, and I am gradually getting the impression that in several cases these models are distinguishable in terms of statistical fit. A one-class model with a continuous factor might fit considerable better or worse than a 2-class model without continuous factors. And often, hybrids fit considerably better than either. True, the classes can be seen as discrete points on a continuum - taking a non-parametric view of the factor distribution (which I think your Vermunt reference is about) - and this matter can only be resolved by relating the classes to other variables - antecedents and consequences - to see if classes are significantly different on those variables. Given that we can now do these analyses conveniently in a general modeling framework, it would be interesting to see more investigations of this type.
 Chuck Green posted on Friday, August 19, 2005 - 8:26 am
Yes, that is the reference to Vermunt to which I was referring. When you discuss hybrids are you referring to latent categorical variables derived from a comnbination continuous latent factors and observed variables?
 bmuthen posted on Friday, August 19, 2005 - 9:23 am
By hybrid I mean a model that has both continuous and categorical latent variables (the outcomes can be of any kind). For example, a latent class model that has factor variation within classes making the items correlate within class.
 Patrick Malone posted on Thursday, July 20, 2006 - 3:33 pm
This is a really simple question, the answer to which has the potential to make me feel a complete idiot.

The general reference to using BIC, ABIC, etc. to select the number of classes is to choose the "lowest" BIC -- in fact, that is the terminology used in the Nylund et al. draft on the website.

But something Bengt said in passing in either Maryland or San Antonio in May stuck with me -- I *think* he said, "the BIC closest to zero."

So do we want lowest value of BIC or lowest absolute value of BIC? I couldn't find guidance in the manual, the technical appendices, or the discussions.

 Bengt O. Muthen posted on Thursday, July 20, 2006 - 3:50 pm
I should have said lowest BIC. Perhaps I was talking about the possibility of a negative BIC which could happen when logL is positive so that the first BIC term is negative - if the sample size and/or #par.'s is small, the negative first tems will dominate the positive second term and BIC will be negative. But even so, we want smallest BIC (not in an absolute sense).
 Patrick Malone posted on Thursday, July 20, 2006 - 7:14 pm
Thank you -- you just saved a paper!
 Justin Jager posted on Tuesday, September 26, 2006 - 9:00 am
I'm using the tech14 option in conjunction with Type=mixture to request a PBL ratio test. I am having a difficult time manipulating which class is the first class identified (and therefore is the class deleted to obtain the c-1 comparison model).

I ran a 3 class solution and a 4 class solution. Comparing the mean estimates across the 3-class solution and the 4-class solution, it is clear which class out of the 4 class solution is the "new" class. Intuitively, it seems to me that I want the "new" class to be the first class identified, so that when comparing the c model to c-1 model the deleted class will be the "new" class.

In order to accomplish this, I re-ran the 4-class solution, but this time used the tech14 option, and listed the start values for the "new" class first, and then listed the start values for the three remaining classes (listing the start values for the largest class last). However, the first class identified is not the "new" class whose start values are listed first.

Given the above I have two questions:
(1) Am I being to stringent in my use of the ratio test by identifying the "new" class and manipulating the start values so that it is the first class identifed?

(2) If the answer to #1 is No, then do you have any suggestions for manipulating the model so that the "new" class is the first class identified?
 Bengt O. Muthen posted on Sunday, October 01, 2006 - 12:14 pm
(1) You don't have to do this. As long as the best loglikelihood values are obtained for a given number of classes, Tech14 is fine.
 Christian M. Connell posted on Friday, October 26, 2007 - 10:05 am
Two questions re: use of Tech14 with LCA:

1) How do you identify the optimal optseed # in cases were several appear to have been replicated (sample output below)?

Loglikelihood values at local maxima, seeds, and initial stage start numbers:

-2571.979 181293 212
-2571.979 688839 273
-2571.979 830570 369
-2571.979 unperturbed 0
-2694.316 350608 334
-2694.316 478421 311
-2696.832 967237 48
-2696.832 247224 94
-2696.832 471398 74


2) in using tech14 with the following values: lrtstarts = 40 10 2500 20; I get a warning indicating that 4 of 5 bootstraps failed to replicate. The p-value for the BLRT is significant, but I assume the lack of replication is a significant problem. How should the lrtstarts be increased, and at what point would you determine that the solution can not replicate?
 Linda K. Muthen posted on Friday, October 26, 2007 - 4:05 pm
You can use any one of the optseeds that result in the best loglikelihood.

Version 5 will have improvements for TECH14. I would wait until it is out and use TECH11 in the meantime.
 Sylvana Robbers posted on Friday, March 28, 2008 - 5:59 am
Dear dr. Muthen,

I would like to react on one of the messages above (July 20, 2006) about the interpretation of the BIC.

Dr. Nagin describes in his book (Group-based modeling of development, 2005) that the lowest absolute BIC-value is preferred, so not the closest to zero.

So, what should be the right approach?
I look forward to your reaction.

Sylvana Robbers
 Linda K. Muthen posted on Friday, March 28, 2008 - 6:46 am
I think the post after that clarified the statement and said the lowest BIC.
 Sylvana Robbers posted on Friday, March 28, 2008 - 7:00 am
Maybe my question was not clear. I mean that dr. Nagin states that the BIC closest to zero is the best, and you propose the lowest BIC, which contradicts each other.

So, is it the lowest BIC or the closest to zero?
 Sylvana Robbers posted on Friday, March 28, 2008 - 7:08 am
By the way, the sentence in my first message should be:
Dr. Nagin describes in his book (Group-based modeling of development, 2005) that the lowest absolute BIC-value is preferred, so the closest to zero.
(without 'not').

Thanks in advance for your time.
 Bengt O. Muthen posted on Friday, March 28, 2008 - 11:35 am
Let me see if I get this right. The Mplus BIC is

BIC(M) = -2logL + r log n,

where L is the likelihood, r is the number of parameters and n is the sample size. In Mplus we want the smallest BIC(M).

Nagin in (4.1) of his book uses the alternative

BIC(N) = logL - 0.5 r log n,

so that BIC(N) = -2 BIC(M). Nagin wants the largest BIC(N).

With BIC(M) the second term is always positive (or non-negative). The first term is typically positive as well because logL is typically negative. The term decreases as the likelihood increases (gets better). So here we want small positive BIC(M) values. In the rare cases when logL is positive (L > 1) the first term is negative and gets bigger negative as the likelihood increases. So here too do we want smaller BIC(M) where say -10 is smaller than -5 (-10 is further to the left on the real line).

I think this in line with my earlier post.
 Sylvana Robbers posted on Monday, March 31, 2008 - 1:55 am
Finally I understand it :-)

Thank you very much for your clear answer!
 Alexandre Morin posted on Friday, April 25, 2008 - 1:01 pm

Could you have made a mistake ?

I've been playing with BICs formulas and something does not add right...
-2 * Bic(M)= -2 (-2LogL + r log n) =
4 logL - 2 r log n, which is not Bic(N).

I believe it is the reverse:
- 2 Bic(N) = BIC(M)
-2 (logL - 0.5 r log n)= -2 logL + r log n.

This does not change the conclusion for the positive versus negative BIC issues however.
 Christian M. Connell posted on Friday, July 11, 2008 - 12:50 pm
Is there a way to test the difference between two LCAs with the same number of classes, but in which one model includes a forced zero-behavior class (i.e., a forced no-substance use class)?

Specifically, I've determined a 4-class solution fits the data best relative to 1,2,3, and 5-class models. However the fit indices for a restricted and unrestricted 4-class model are nearly equivalent:

Adj BIC: 8138.76 (unrest) 8140.35 (rest)
LMR-LRT p-value: .03 (unrest)<.001 (rest)
Entropy: .83 (unrest) .83 (rest)

My sense is that the BIC discrepancy is negligible (?) and that the restricted model should be selected based upon parsimony (i.e., fewer parameters estimated). The actual make-up of the 4-classes is comparable (basically, one of the unrestricted model classes has a high number of non-use individuals), but prevalence of the classes is different across the models, and predictors vary slightly.

Any guidance -- or a specific test that I might run to determine the "best" model?

 Bengt O. Muthen posted on Sunday, July 13, 2008 - 5:13 pm
I think I would go with the unrestricted model if it has the lower BIC, unless you have a specific theory for the existence of a zero class. I am not much for "model trimming", but just reporting the results even if some parameters may not be needed. But if you want to test for one of the classes being at zero probability for all item, perhaps you can use the Wald test of Model Test. For instance, you can define item j's probability as

Model Constraint:

pj = 1/(1+exp(tj));

where tj is the label for the threshold of item j. Then you use

Model test:


You do this for all items at the same time. I haven't tried it, but I think it should work.
 Christian M. Connell posted on Monday, July 14, 2008 - 4:01 am
Thank you for the response. A couple of follow-ups:

Is there any indication as to the "sensitivity" of the Adj. BIC -- given these differences are less that 2 pts?

Would these two 4-class models be considered nested, since the same predictors and number of classes are specified -- only one model includes constraints that force one class to include youth with no use? If so, is it appropriate to conducted a difference in chi-square (or equivalent) test to see whether the constraints significantly worsen fit?

Also, any utility in examining the quality of classification table -- even though entropy is identical, there is some variation both the diagonal and off-diagonal elements.
 Bengt O. Muthen posted on Monday, July 14, 2008 - 8:16 am
There is a literature on how differences between BIC values should be viewed. See for instance

Kass R. E. and Raftery, A. E. (1993). Bayes factors. Journal of the American Statistical Association 90, 773-795

The models are nested, but the assumptions of the likelihood-ratio chi-square test are not fulfilled because the zero-class model specifies parameters that are on the border of their admissible space, namely zero item probabilities conditional on class. Come to think of it, the Wald test would be negatively affected by this too.

The classification table might tell you about differences across models in being able to tell certain classes apart.
 Bruce A. Cooper posted on Monday, July 14, 2008 - 3:23 pm
I'm evaluating GMM solutions to identify the "best" number of classes for an outcome (total sleep time) measured over 14 occasions. There is clearly one large class, and some number of smaller classes, probably 2 or 3.

I take it from Bengt's response in this thread on October 01, 2006 - 12:14 pm that specifying starting values is not necessary to get a valid BLRT test of the K vs K-1 solutions with TECH14. I assume that the way the classes differ would therefore be based on the profile means for the two solutions? So, how to identify the K and K-1 mean profiles that were identified in the BLRT? (I've found that the "K" BLRT solution is not identical to the model solution it follows.")
 Bruce A. Cooper posted on Monday, July 14, 2008 - 3:53 pm
Next question on selecting the "best" number of classes.

I'm puzzled about the interpretation of the VLMR and BLRT tests for the K vs K-1 solutions. I've obtained identical H0 LL values (and -2LL diff values) from TECH11 and TECH14 outputs in the same run, and VLMR will have a p-value WAY larger than .05, while BLRT will have a p-value below .0000. Further, after obtaining BLRT for a sequence of solutions with increasing classes (even specifying LRTBOOTSTRAP=100), I have found that it is always significant at .0000 even when the #classes is getting silly. I've also found that the BIC and SABIC also get smaller with additional classes, even for a large number of classes (5 or more, with most being very small), so not much help there.

Tofighi & Enders (2008) recommended the BLRT and sample-size adjusted BIC as the most useful indices for GMM solutions, and Nylund et al. (2006 revised draft) also like the BLRT for GMM.

Do you have further suggestions on the use of these indices vis a vis the # of classes and substantive interpretations in trying to find the "real" solution?

 Bruce A. Cooper posted on Monday, July 14, 2008 - 5:31 pm
Still another question re the # of classes (my ignorance seems bottomless!) -

How does one decide what values to specify for the LRTSTARTS command? I've read a couple of posts here about it, and the V5 manual (pp. 500-501). The defaults are 0 0 20 5, and you suggest perhaps 2 1 50 15 as an example of a different specification. Why so few for the K-1 class solution? Why that number for the K-class solution? (The question arises partly because of the problematic data set you've helped me with, for which I needed to specify 1500 random starts to get a replicated maximum before going to the BLRT. So, I can use the OPTSEED option from one or more of the replicated solutions when I run the TECH14 analysis, but what issues dictate how to choose the number of draws for the bootstrapped K-1 and K solution analysis?)

Any references would be great, so I don't have to keep bothering you!

 Bruce A. Cooper posted on Tuesday, July 15, 2008 - 7:38 am
Belay that second message re the use of the BLRT resulting in p-values < .0000.
That happened with specified starting values and LRTBOOTSTRAP >= 100, but after running more models last night, and another TECH14 BLRT this morning using the default for holding random effects constant, and allowing the TECH14 to run on its own (no specs for LRTBOOTSTRAP or LRTSTARTS), I got a BLRT that made sense (in this case, not significant)! This was after specifying OPTSEED from an analysis I ran overnight, where I used 4000 random starts and got two (2!?!) replicated maximums for the LL.

I also checked the K-1 LL in the TECH14 output, and it was the same as the maximum from the 3-class solution I had gotten previously with the same ANALYSIS settings. So there's some consistency! and a way to know what the K-1 solution would look like from the TECH14 K-1 model.

I'm still troubled though, that out of 4000 random starts, I would get only two replications of the largest LL (-3880.753), then just one LL that I had gotten multiple replications of in prior runs of this 4-class analysis (-3882.103), then 25 identical LL (-3883.747), with the difference between the largest and smallest LL from these three solutions being only 2.994. Having only two out of 4000 replicated LL still seems pretty chancy, making me wonder about a local maximum that I just happened to hit by chance, twice.

 Bengt O. Muthen posted on Tuesday, July 15, 2008 - 10:28 am
Your 4 last posts touch on topics we teach at our 2-day Mplus Short Courses. One just took place as the psychometric meeting and another one is coming up in November at Ann Arbor. This is in the area of Topic 5 and 6 (see our web site for topics and handouts). It is too large a topic to teach on Mplus Discussion - so I will just give some brief comments.

Topic 5 has on slide 197:

"More On LCA Testing Of K – 1 Versus K Classes Bootstrap Likelihood Ratio Test (LRT): TECH14

• LRT = 2*[logL(model 1) – logL(model2)], where model 2 is nested within model 1
• When testing a k-1-class model against a k-class model, the LRT does not have a chi-square distribution due to boundary conditions, but its distribution can be determined empirically by bootstrapping
Bootstrap steps:
1. In the k-class run, estimate both the k-class and the k-1-class model to
get the LRT value for the data
2. Generate (at most) 100 samples using the parameter estimates from the
k-1-class model and for each generated sample get the log likelihood
value for both the k-1 and the k-class model to compute the LRT
values for all generated samples
3. Get the p value for the data LRT by comparing its value to the
distribution in 2."

Because step 2 generates data according to the k-1-class model, the k-1-class model is easier to fit than the k-class model and therefore requires fewer starts.

Having only 2 replicated best LLs out of 4000 is a sign of a problem - it typically indicates that the model tries to read too much out of the data. This happens when using too many parameters such as too many classes, particularly when the sample size is not large and the data signal is not strong.
 Bruce A. Cooper posted on Tuesday, July 15, 2008 - 11:26 am
Thanks very much for the information, Bengt, and for confirming that getting only 2 maximum LL out of 4000 does indicate a problem with this model! I *have* been worried that I was trying to squeeze water from this stone!

I saw an earlier announcement about the November short course and I have been planning to attend. Meanwhile, I see that your handouts are available to view online and will look through the ones for Topics 5 & 6.

You used to sell the short course handouts through the Mplus website, but I couldn't find the page for ordering them. It's very generous of you to offer them for download now! I like to read as much as I can to find answers to my questions before bothering you folks, so I really appreciate the references you provide and the short-course handouts. Thank you.

 Bruce A. Cooper posted on Tuesday, July 15, 2008 - 5:04 pm
I've run a 3-class GMM to test a 3 vs 2-class model with BLRT. I first ensured that I had a maximum LL that had many replications, then selected two seeds to check the solution. OPTSEED with both seeds produced identical solutions. I ran the model with one of the seeds using OPTSEED, first with no specifications for the TECH14 BLRT. I got a p = 0.3333 with 9 successful BS draws. Then, I ran the model with the same OPTSEED, but this time specifying

K-1STARTS = 100 20 ;
LRTSTARTS = 5 2 20 10;

to increase the reliability of the BLRT and the K-1 solution. I get identical LL for the K model (same seed), and identical LL for the K-1 model (and it is identical to the LL from the previous 2-class model I ran without BLRT).

However, the p-value for the BLRT is now 0.0000 with 50 successful BS draws. This is the same type of result I referred to earlier, where the BLRT produces p = .0000 no matter how many classes are in the model, when specifying LRTBOOTSTRAP at some number, usually 50, 100, or 150 for the models I've run.

Could you direct me to a reference that will help me understand this inconsistency in BLRT p-values for the same LL and -2LL diff?

 Linda K. Muthen posted on Wednesday, July 16, 2008 - 3:14 pm
We don't have any further references. Please send your input, data, output, and license number to
 Bruce A. Cooper posted on Monday, July 28, 2008 - 11:12 am
Erratum: My posting on July 14, 2008 - 3:53 pm was wrong about the recommendation of Tofighi & Enders (2008) regarding the indices they recommended for choosing the number of classes in a GMM. I have found my own error in a Google(TM) search using their names, and hope this correction will also show up in future Google searches so folks won't be misled by my error.

I wrote that Tofighi & Enders "recommended the BLRT and sample-size adjusted BIC as the most useful indices for GMM solutions" but my memory failed me, and I should have looked at the paper again before citing them. In fact, Tofighi & Enders (2006; 2008) did not evaluate the BLRT at all in their Monte Carlo study of GMM indices for choosing the number of classes, because it had just been added to Mplus when they did their study. Instead, their recommendation was that the sample-size adjusted BIC was best overall index in its performance across a number of conditions, and the Lo-Mendell-Rubin LRT was next best in several situations.

In fact, Nylund, Asparouhov, & Muthen (2006 revised draft; Structural Equation Modeling, 2007) recommended the BLRT for choosing the number of classes in GMM under some circumstances, but their Monte Carlo study did not evaulate very many conditions for GMM.

Sorry for the error!
 Bruce A. Cooper posted on Monday, July 28, 2008 - 11:38 am
Linda -

Thanks for your note on July 16th and the follow-up tech spt via email. My question is "where-to-go-from-here-for-now" in using the BLRT to help choose the number of classes for GMM.

I understand that it is best to set the largest class as the last class when using the BLRT, but (from another posting) that the order of the other classes is not essential. However, I have been unsuccessful in making the largest class the last class, whether I use class intercepts from the solution being tested as starting values, or whether I use the categorical latent variable means. With either method, the program still re-orders the classes so that the largest class is not last in the TECH14 run, defeating the purpose of the BLRT to some extent, and taking a lot of time to run repeated bootstrap tests trying to make the last class the largest. Using the categorical latent variable means, for example, I got this from the prior solution for the 4-class model being tested, in which the last class WAS largest:

Categorical Latent Variables
C#1 -1.146
C#2 -2.576
C#3 -2.860

So, I specified in the Model %OVERALL% statement for the TECH14 run:

[ c#1*-1.146 c#2*-2.576 c#3*-2.860 ];

But no matter how I ordered those three values, the subsequent runs put the largest class as number 2 or 3. Any thoughts about what I'm doing wrong?

 Linda K. Muthen posted on Monday, July 28, 2008 - 11:41 am
Please send your files and license number to
 Mogens Fenger posted on Sunday, August 17, 2008 - 9:28 am
Dear Linda and Bengt,

I’m running a lot of LCA and SEM analysis with and without factors and covariates. A few questions (hope they are not too "simple"):
1) A suppose that the dot in scale correction factor is a decimal identifier.
2) The simplified Satora-Bentler use of scale correction factors in the Difference testing I suppose that the number of parameters are the number of free parameters.
3) When I incorporate co-variates in an analysis the number of subjects may differ considerably because of exclusion of missing values for the co-variates. Concomitantly, the LL and the statistics (AIC etc) change considerably, say with 2100 subjects the BICadjusted may be 31,000 and when incorporating a co-variate only 1700 subjects are included with a BICadjusted being 25,000. The correction for sample size may not seem appropriate. Is it possible to compare the two models by calculating F in chi-square (n-1)*F? Or alternatively correcting e.g. BIC by correcting for sample size (e.g. using the above numbers BICcorr = (25,000/1700)*2100 = 30,882)?
An alternative way to do comparisons is by using the USEOBSERVATION option to only include a full data set. Its easy to do, but becomes tedious as more covariates are included.
4) How do you calculate DF in a mixture model?
5) Is there any rules for using the entropy meassure in decision of best fit?

 Linda K. Muthen posted on Monday, August 18, 2008 - 8:33 am
1. Yes.
2. Yes.
3. Only models with the same set of observed variables and the same set of observations can be compared.
4. Degrees of freedom are relevant only for models where means, variances, and covariances are sufficient statistics for model estimation. In other cases, the number of free parameters is used.
5. Entropy is not a fit statistic.
 Mogens Fenger posted on Tuesday, August 19, 2008 - 1:50 am
Thanks Linda, This cleared up a few things for me. A few follow up questions:

If we in a model with the same set of obersvations replace one covariate with another so the number of paramters and subjects are the same, shoudln't it be possible to compare the two models? If the first model gives a BIC say 31000 and the second 30000, wouldn't you conclude that the second model is a better model and should be preferred?

Entropi: although entropi is not a fit statisitcs, is there any formal way to conclude that an entropy of 0,7 is worse than 0.8 (e.g. in the example above), and would you be able to include such a result in your decission of which model to choose?

 Linda K. Muthen posted on Tuesday, August 19, 2008 - 7:25 am
You cannot change the set of observed variables if you want to compare models.

Entropy ranges from zero to one with the higher value being the better value as far as classification is concerned.
 Michael Spaeth posted on Wednesday, October 29, 2008 - 2:57 am
I have a question concerning tech14. My LRT-value in the real data is dramatically different as compared to the LRT-value in the simulated data sets, which I monitored in the tech8 window (e. g. 124 vs. 66). Should I alter the settings of lrtstarts, or is this not really a problem? In addition the p-values of VMLR and tech14 dramatically differ too (p = .95 vs. p < .001). The real data and generated data H0-LL's (also H1-LLs) in tech14 are the same, as described in the manual. So I think it's a problem with the generated data sets and their H1- and H0-LLs (lrtstarts)!?
 Linda K. Muthen posted on Wednesday, October 29, 2008 - 8:02 am
I would need to see your outputs to answer this question. Please send them and your license number to
 Michael Spaeth posted on Wednesday, October 29, 2008 - 9:12 am
Ok, it takes some days to finally compute the tech14s then I would send it. But I guess it's rather hard to find a hint on only the outputs because BLRT LRT-values of generated data sets are available only on the tech8 window (disappearing after computation).
The following sentence in my last post was nonsense:
'The real data and generated data H0-LL's (also H1-LLs) in tech14 are the same, as described in the manual.'--> I only wanted to say, that I reproduced the H0 and H1 LLs of 'real data k-1 run' in my tech 14 run, probably pointing to the fact that something is wrong with bootrapping of the generated data set. But this is a guess based on the phenomena I described regarding the LLs in the tech8 window.

thank you and so long, michael
 Mogens Fenger posted on Sunday, November 16, 2008 - 1:32 am
Dear Linda and Bengt,

In a SEM mixture model can you compare (e.g. using BIC) two models in which one model treats an indicator as continuous and the second model treats the indicator as ordinal?


 Bengt O. Muthen posted on Sunday, November 16, 2008 - 10:29 am
No, that gives different likelihood scales.
 Alexandre Morin posted on Friday, January 23, 2009 - 12:25 pm

In mixture models (especially with large samples) it sometimes happens that the examined fit indices (CAIC, BIC, aBIC, etc) keep on decreasing while additional classes are added, potentially because of their sensitivity to sample size. In most cases when this happens, the additional classes don't necessarily make sense (susbstantively or statistically: very small classes, classes that only represent a meaningless division of preceding classes, etc.).

In those cases, to choose the number of classes one is left with theory and subjectivity.

It seems to me that in such cases the fit indices (CAIC, BIC, aBIC, etc) associated with varying number of classes might be depicted graphically and interpreted as an EFA scree test to help in the determination of the correct number of classes.

1) Do you have any misgiving about this method ?
3) Do you know of any references of a paper either suggesting the use of this method (scree test) or using this method to choose the correct number of classes)?

Thank you very much.
 Bengt O. Muthen posted on Friday, January 23, 2009 - 5:53 pm
1) No
2) No

However, you need to always be open to the possibility that you are fishing in the wrong pond - the model family you are in may not be the best for the data and if you switch model family you might find a minimum BIC. For example switching from LCGA to GMM.
 Matthew Cole posted on Wednesday, June 10, 2009 - 4:56 pm
Hi Linda and Bengt,

Can you tell me what the difference is in Tech11 between the VLMR and the adjusted LMR?

Also, you recommend that the last class is the largest class. However, you also recommended that model identifying restrictions not be included in the first class. Would you consider starting values for the first class to be model identifying restrictions? I am currently using starting values for the first class to make it the smallest class, thereby making sure that the first class for Tech11 is not the largest class.

Thanks, Matt
 Linda K. Muthen posted on Thursday, June 11, 2009 - 11:15 am
The authors provided a post-hoc adjustment. You can see the original article for the details. We do not use the adjusted LMR.

Starting values are not model identifying restrictions. These would be some restrictions on model parameters.

I would use starting values to make the last class the largest class not the first class the smallest class.
 Keng-Han Lin posted on Thursday, February 18, 2010 - 1:28 pm
Hi Linda and Bengt,

I'm using LCA on a complex survey data with 48 indicators (13 of them are continuous). The BIC suggests 6-class model, 46213.84(2-class model) 45634.95(3) 45419.72(4) 45322.00(5) 45195.95(6) 45290.84(7), but we are wondering if there's other rule we could follow to better determine the number of class.
BLRT(tech 14) doesn't support for complex data. The results (p-value )of LMR test are as following, 0.047(2-class model) 0.373(3) 0.582(4) 0.591(5) 1.000(6) 1.000(7). In 3-class model which has LMR p-value of 0.373, does it suggest 2-class(H0) is good enough in our case?

If so, which statistics should I depend on? Or other criteria I should take into account?

Thank you so much for your help.
 Linda K. Muthen posted on Thursday, February 18, 2010 - 2:03 pm
Perhaps the following paper which is available in the website can help:

Nylund, K.L., Asparouhov, T., & Muthen, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.

You should also consider whether the classes have any substantive or theoretical basis.
 Ingrid Holsen posted on Tuesday, December 07, 2010 - 5:14 am
I am investigating trajectories of body image over time, 13 to 30 years. The sample in GMM is 1082. I have problems deciding number of classes (we are several here who are puzzled).
3 classes; BIC the lowest, but LMR-LRT 0.079, with 4 classes it is 0.038). Entropy for 3 classes is 0.727 and for 4 classes 0.734 (not very high!). However, the 4 class solution has one class which is only 1.5%, that's 15 persons, don't make much sense. I have learned to trust the LMR-LRT, but find it hard to proceed with 4 classes due to the sample size in each class. What is your opinion, also regarding the relatively low entrophy?
Best regards, Ingrid
 Ingrid Holsen posted on Tuesday, December 07, 2010 - 6:32 am
Adding to my questions above... The Bootstrap test is significant for both a 3 and 4 class solution. The LMR-LRT value above is of course the p-value.
A collegue suggested to add q@0 to my modelcommand, then the 3 class solution performed somewhat better. The LMR-LRT (p) 0.024 ( 4 class (p) 0.21 ). However the Entropy is around 0.70 (a bit lower for both a 3 and 4 class solution).
Thanks for your help.
 Linda K. Muthen posted on Tuesday, December 07, 2010 - 10:34 am
The significance of LMR-LRT should be interpreted only the first time it is greater than .05 not after that. Your first post suggests two or three classes. The meaningfulness of the classes should determine your choice.
 tomas dvorak posted on Thursday, January 03, 2013 - 6:54 am
a have a question about selecting the number of classes in LCA. For any reasonable number of classes Tech11 and Tech 14 show zero p-values. Also BIC decreases (for 3 or more classes) rather slowly. Does this mean LCA does not fit the data and should not be used? On the other hand, 4 and 5 class solutions have high entropy (around 0.9) and(given my research goals) seem to make sense. Is it ok to use LCA and choose a solution that makes sense with respect to my research goals?
Thanks for your help,
 Linda K. Muthen posted on Thursday, January 03, 2013 - 11:36 am
This may point to the need for a different type of model, for example, a factor mixture model. If your indicators are categorical, TECH10 might help you see what the problem is.
 Alysia Blandon posted on Tuesday, June 11, 2013 - 9:31 am
I am running a Latent Profile Analysis with 5 continuous variables. My sample size is 430. I have been able to replicate the loglikelihood values for the 1 and 2 class solutions. I have also been using the steps outlined by Asparouhov & Muthén webnote 14 to test the number of latent classes using the BLRT from TECH 14 using the OPTSEED option. Based on those recommendations I get the warning THE BEST LOGLIKELIHOOD WAS NOT REPLICATED. I continue to get the message even after increasing LRTSTARTS to 50 20 100 20 and using LRTBOOTSTRAP from 100 through 500. In all of these, the loglikelihood for the k-1 class is the correct one from step 1.
 Bengt O. Muthen posted on Tuesday, June 11, 2013 - 10:31 am
Please send input, output, data and license number to Send the outputs from all the steps recommended in Web Note 14.
 Miriam Forbes posted on Monday, March 03, 2014 - 7:16 pm
Hi Linda and Bengt,

In my LPA and FMA analyses, the BLRT remains significant (at .0000) for every analysis. My samples are n = 533 and n = 181, so I'm not sure it's a result of too much power. Shaunna Clark thought she had read cases of this, and suggested relying on ICs and substantive interpretation of the models (and the other LRTs, which do reach non-significance). I was wondering whether you know of any citations to justify this kind of decision?

Thanks and all the best,

 Bengt O. Muthen posted on Tuesday, March 04, 2014 - 6:50 pm
I would use BIC and substantive interpretation. The Nylund et al paper also shows that BIC is one of the top indices (second best?). Notwithstanding the Nylund et al article, my experience is that BLRT is less dependable in practice for some reason.
 Miriam Forbes posted on Tuesday, March 04, 2014 - 9:43 pm
Thanks, Bengt! I'll emphasise the Nyuland et al. paper.

All the best,

 hazel liao posted on Wednesday, June 10, 2015 - 5:08 am
As far as I know, BIC decreases when the model has less free parameter which means the model is parsimony and we should choose the lowest BIC after compared models.

However, when I compared models ( 2-group with 16 free parameters, 3-group with 22 free parameters, 4-groups with 28 free parameters), BIC/ adjusted BIC decreased when group number increased. Is this output reasonable?
 Linda K. Muthen posted on Wednesday, June 10, 2015 - 1:30 pm
BIC depends not only on the number of parameters but also on the loglikelihood. If the loglikelihood increases faster then the penalty for the number of parameters, BIC will decrease.
 hazel liao posted on Wednesday, June 10, 2015 - 7:59 pm
You are right, loglikelihood increases faster in my example.Thank you very much!!

By the way, I know we should minimize fit function, but what relation between the loglikelihood and Fit function?
 Sabrina Thornton posted on Tuesday, October 27, 2015 - 1:52 am

I read 'Using Mplus TECH 11 and TECH14 to test the number of latent class' by Asparouhov & Muthen. I presume that it is appropriate to use AUXILIARY(bch) command in these tests since I do have a distal outcome variable that was predicted by the resulting classes. It does make sense to me to include this common as done in the automatic 3 step approach.

Thanks in advance for your help.

 Jon Heron posted on Tuesday, October 27, 2015 - 3:13 am
it's pretty much standard practice to determine the number of classes using an unconditional mixture model without either covariates or outcomes.

In theory you might, in a GMM, have a covariate which explains some intercept variance and hence reduces the number of classes but I've only ever seen this done once in a publication.
 Christy Denckla posted on Friday, October 28, 2016 - 1:39 am
TECH 14 is indicating that with 1000 random starts I am only getting 2
successful bootstrap draws for 3 vs 4 classes. While the LRT values make sense at 4 classes (p=< .01), the BLRT p-value is 1.0. I checked for a local solution by testing if the 4 class model parameter estimates were replicated across two different models with seed values from the two best loglikelihood values, and I got replicated parameters.

We have specified freed time parameters given the likelihood of non-linear time (looking at depression before and after childbirth among a bereaved sample). Our sample size is >10,000.

Based on an earlier post, it seems that only 2 successful bootstrap draws with 1000 random starts could be a issue. What could be causing the failure to replicate no more than 2LLs? Is the BLRT pointing to a problem in our model specification?
 Bengt O. Muthen posted on Friday, October 28, 2016 - 12:24 pm
Regarding your second sentence, there is no LRT for mixtures unless you are looking at a frequency table.

Regarding BLRT problems, see our web note 14.
 samah Zakaria Ahmed posted on Monday, February 13, 2017 - 12:04 pm
depending on AIC and BIC, i found that 4 classes are suitable for my latent class variable in spite of depending on the entropy, i found that 2 classes are suitable. how can i take the correct decision?
 Bengt O. Muthen posted on Monday, February 13, 2017 - 5:54 pm
Q1: Yes.

1. See Topic 5's treatment of FMM.

2. See the paper on our website:

Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169-177, DOI: 10.1080/10705511.2014.935844. Download Mplus files.

For these general analysis strategy questions you want to use SEMNET.
 Bengt O. Muthen posted on Monday, February 13, 2017 - 5:56 pm
Q1: Yes.

1. See Topic 5's treatment of FMM.

2. See the paper on our website:

Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169-177, DOI: 10.1080/10705511.2014.935844. Download Mplus files.

For these general analysis strategy questions you want to use SEMNET.
 Katie Gelman posted on Saturday, June 17, 2017 - 7:29 am
Dear Drs. Muthen,
I have a question about assessing the stability of classes, specifically stability of individuals within class in a GMM. Is the bootstrap ever used to replicate and compare resulting class membership once the best fitting number of classes has been established? If so, how would I do this?

Many thanks.
 Virginia Rangel posted on Friday, June 30, 2017 - 12:42 pm
I am working on trying to create a typology (the literature suggests three types) but all of the analyses, regardless of the model, suggests a one-class model. To arrive at this conclusion, I focused on interpreting the LMR test, which for all classes (i.e., 2, 3, etc.) has a p value much greater than 0.05 (e.g., 0.2, 0. 5, etc.). Is this the correct interpretation, or should I look to other indicators (e.g., BIC) or to the final class counts (or to something else), which might suggest a different number of classes?
 Bengt O. Muthen posted on Friday, June 30, 2017 - 5:06 pm
Just use BIC.
 Kathy Xiao posted on Monday, August 21, 2017 - 2:55 pm
I have around 14 binary and categorical variables for LCA, but starting from 2 classes, the p-value of LMR is non-significant, and it never becomes significant. May I know any explanation of this situation? Thanks very much!
 Bengt O. Muthen posted on Monday, August 21, 2017 - 4:59 pm
You may have only 1 class. Check BIC.
 Kathy Xiao posted on Monday, August 21, 2017 - 5:02 pm
Thanks for your reply. The BIC and Adjusted BIC kept decreasing from 1 class and more. So I am not sure which class to select.

The entropy increases till 4-class model and then decreases.
 Bengt O. Muthen posted on Monday, August 21, 2017 - 5:17 pm
I would go with BIC and when, as here, it doesn't help, I would change the model, e.g. introduce residual correlations or a factor. See our handouts and video for Topic 5, plus the paper on our website:

Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169-177, DOI: 10.1080/10705511.2014.935844. Download Mplus files.
 Kathy Xiao posted on Monday, August 21, 2017 - 5:30 pm
What if the BIC keeps decreasing to 8 classes and it is substantively not meaningful?

If I don't change the model, can I select the model by (1) largest entropy; (2) relatively small BIC; (3) substantive interpretation when the p-value of LMR are all non-significant?

Thanks for answering and sharing the article!
 Bengt O. Muthen posted on Tuesday, August 22, 2017 - 5:24 pm
I would not use entropy as the major factor in deciding on the number of classes. That's like choosing a SEM model based on R-square instead of model fit.

Read what I suggested instead.
 Daniel Lee posted on Sunday, December 24, 2017 - 9:26 pm
Hi Dr. Muthen,
In my Latent Profile Analysis, the AIC, BIC and adjusted BIC were lower (>100) in the 3 class model, but the LRT tests were not significant.

I am not sure what to make of these results. Does it mean I should proceed to examine a 4-class LPA model? Or, should I just accept the 2-class LPA model (even though the BIC, AIC, and adjusted BIC decrease by over 100 units)?

Thank you for your help.
 Bengt O. Muthen posted on Tuesday, December 26, 2017 - 4:06 pm
Go by BIC.
 Emily Gagen posted on Saturday, January 20, 2018 - 12:31 pm
Hi Dr. Muthen,

In my latent class analysis, the AIC and ssaBIC decreased with each increase in number of classes, but the BIC increased at each step. Entropy increased up until the 4 class model, and decreased with 5 classes. The 3 class model is most consistent with the literature and is the most interpretable. I'm mostly wondering whether I should be concerned that the BIC value was increasing with each successive model.

Thank you for your help!
 Bengt O. Muthen posted on Saturday, January 20, 2018 - 4:41 pm
You should indeed be concerned that BIC doesn't decrease with increasing number of factors. If BIC is worse for 2 classes compared to 1 class (non-mixture), I would stop and not use a mixture. Perhaps your sample size is extreme.
 Emily Gagen posted on Saturday, January 20, 2018 - 6:00 pm
I apologize, I wasn't clear - the BIC decreased (improved) for 2 classes compared to 1 class, but then increased again for each subsequent model (3-5). Separately, the AIC and ssa BIC decreased each time. Is this an indication that I should choose a 2 class model rather than 3? For reference, my sample size is 134 - I'm not sure what you mean by extreme. Thank you for your help.
 Bengt O. Muthen posted on Sunday, January 21, 2018 - 5:21 pm
I would choose 2 classes.
 Larissa Gaias posted on Monday, January 22, 2018 - 12:16 pm
Hello, I am having a related problem to Kathy, but it is particularly an issue related to trying to account for nesting in my data.

I have a Latent Profile Analysis with 6 indicators. I have 103 teachers who are in 9 schools. I am trying to account for non-independence using type = complex, cluster is schoolid.

My BIC and AIC continuously decrease as I add a profile. When I don't account for clustering, the LMR LRT suggests a 3-profile model (I include one covariance between two indicators, constrained to be equal across profiles). When I do account for clustering, I get non-significant LRTs for all profiles. However, the estimates of the indicators for each profile do not change regardless of whether clustering is accounted for or not.

I am wondering about the appropriateness of the clustering in this situation (is 9 clusters too small) and why we might be seeing this issue with fit even if the results don't change? Can I trust the results of the non-clustered LPA if the results are identical when we account for clustering? Thank you!
 Larissa Gaias posted on Monday, January 22, 2018 - 2:26 pm
I should add that I tried adding covariances and a factor to the model with clustering and it did not help with distinguishing between the classes using the LMR LRT.
 Bengt O. Muthen posted on Monday, January 22, 2018 - 5:09 pm
9 schools is too few for Complex (and Twolevel) - you need at least 20. Instead, use 8 dummy variables as covariates to represent the schools.
 Larissa Gaias posted on Monday, January 22, 2018 - 6:33 pm
Thank you! Is the case only for LPA/LCA or for all models?
 Bengt O. Muthen posted on Tuesday, January 23, 2018 - 5:46 pm
For all models.
 JuliaSchmid posted on Wednesday, March 14, 2018 - 3:08 am
I'm running an LPA with 6 indicators and a sample n > 2000. I'm comparing different latent-class-solutions. The BLRT is for every solution significant. The BIC (and SABIC) dicreases with each class more. However, there is a "kink" (comparatively big dicrease) in the between the 5-class and the 6-class solution:

classes BIC
1 33468
2 30722
3 30083
4 29615
5 29375
6 28561
7 28401
8 28253
9 28133

Is this a sign that I should choose the 6-class-solution?

Thanks for a reply!
 Bengt O. Muthen posted on Wednesday, March 14, 2018 - 3:27 pm
Perhaps. But the fact that BIC doesn't show a minimum may reflect the need to include residual correlations for certain pairs of variables; when such correlations are included as needed, BIC may show a minimum. A factor mixture model may be helpful in finding such residual correlations.

But if you don't feel comfortable with such an analysis excursion, you might want to simply pick 6 classes and report the BIC decline over classes as an argument.
Back to top
Add Your Message Here
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Options: Enable HTML code in message
Automatically activate URLs in message