Message/Author 

Daniel posted on Tuesday, August 24, 2004  12:08 pm



I read your recent chapter in the Kaplan text on latent variable modeling, and have a question about a paper I'm revising. I have a continuous outcome that is not normally distributed. I modeled it with freely estimated class variances, and found three trajectories. However, the variables are not normally distributed. I gather that I cannot use the LMR LRT for testing the number of trajectories. The BIC supports four classes, but the change in BIC from three to four classes is minimal, and the fourth class is very superficial. Can I use the BIC criteria, since I cannot use the LMR LRT to select 3 classes? 

bmuthen posted on Tuesday, August 24, 2004  12:23 pm



Note that the assumption of withinclass normality does not imply that the mixture is normal but can be very nonnormal. Your observed variables correspond to the mixture. So these mixture models handle nonnormal observed variables. Therefore, you can use the LMR. The performance of the LMR  and the BIC  is, however, not sufficiently wellknown in most cases. I would not decide on number of classes based only on statistical measures but also interpretability. Given what you say about the superficial 4th class, it sounds like the way to go is 3 classes. 

Anonymous posted on Wednesday, August 17, 2005  2:51 pm



I am currently working on a model of behavior change about which there is some controversy in the literature. Specifically, there is some debate regarding the nature of the latent "readiness to change" construct. Some of the debate about the measurement model for the readiness to change construct essentially centers on whether individuals occupy discrete/discontinuous states of readiness or whether the construct is better modeled as continuous. My question is whether evaluation using mixture modeling might be able to shed some light on this, on statistical grounds. That is would a finding that one class best explained the data argue against the notion of different discrete classes and suggest a continuous latent construct? Or is the distinction between continuous and categorical latent variables largely a matter of heuristic concern? Vermunt, for instance, notes that the distinction between latent classes and latent traits is largely a matter of the number of points across which one intergrates. So I suppose the question is does mixture modeling provide statistical evidence regarding the categorical/continuous nature of a latent construct? 

bmuthen posted on Thursday, August 18, 2005  9:47 am



That is a good question that we still know too little about. I was less hopeful about this earlier. But I am looking at this currently for categorical items, contrasting latent class modeling with factor analysis modeling and with hybrids of the two, and I am gradually getting the impression that in several cases these models are distinguishable in terms of statistical fit. A oneclass model with a continuous factor might fit considerable better or worse than a 2class model without continuous factors. And often, hybrids fit considerably better than either. True, the classes can be seen as discrete points on a continuum  taking a nonparametric view of the factor distribution (which I think your Vermunt reference is about)  and this matter can only be resolved by relating the classes to other variables  antecedents and consequences  to see if classes are significantly different on those variables. Given that we can now do these analyses conveniently in a general modeling framework, it would be interesting to see more investigations of this type. 

Chuck Green posted on Friday, August 19, 2005  8:26 am



Yes, that is the reference to Vermunt to which I was referring. When you discuss hybrids are you referring to latent categorical variables derived from a comnbination continuous latent factors and observed variables? 

bmuthen posted on Friday, August 19, 2005  9:23 am



By hybrid I mean a model that has both continuous and categorical latent variables (the outcomes can be of any kind). For example, a latent class model that has factor variation within classes making the items correlate within class. 


This is a really simple question, the answer to which has the potential to make me feel a complete idiot. The general reference to using BIC, ABIC, etc. to select the number of classes is to choose the "lowest" BIC  in fact, that is the terminology used in the Nylund et al. draft on the website. But something Bengt said in passing in either Maryland or San Antonio in May stuck with me  I *think* he said, "the BIC closest to zero." So do we want lowest value of BIC or lowest absolute value of BIC? I couldn't find guidance in the manual, the technical appendices, or the discussions. Thanks. 


I should have said lowest BIC. Perhaps I was talking about the possibility of a negative BIC which could happen when logL is positive so that the first BIC term is negative  if the sample size and/or #par.'s is small, the negative first tems will dominate the positive second term and BIC will be negative. But even so, we want smallest BIC (not in an absolute sense). 


Thank you  you just saved a paper! 

Justin Jager posted on Tuesday, September 26, 2006  9:00 am



I'm using the tech14 option in conjunction with Type=mixture to request a PBL ratio test. I am having a difficult time manipulating which class is the first class identified (and therefore is the class deleted to obtain the c1 comparison model). I ran a 3 class solution and a 4 class solution. Comparing the mean estimates across the 3class solution and the 4class solution, it is clear which class out of the 4 class solution is the "new" class. Intuitively, it seems to me that I want the "new" class to be the first class identified, so that when comparing the c model to c1 model the deleted class will be the "new" class. In order to accomplish this, I reran the 4class solution, but this time used the tech14 option, and listed the start values for the "new" class first, and then listed the start values for the three remaining classes (listing the start values for the largest class last). However, the first class identified is not the "new" class whose start values are listed first. Given the above I have two questions: (1) Am I being to stringent in my use of the ratio test by identifying the "new" class and manipulating the start values so that it is the first class identifed? (2) If the answer to #1 is No, then do you have any suggestions for manipulating the model so that the "new" class is the first class identified? 


(1) You don't have to do this. As long as the best loglikelihood values are obtained for a given number of classes, Tech14 is fine. 


Two questions re: use of Tech14 with LCA: 1) How do you identify the optimal optseed # in cases were several appear to have been replicated (sample output below)? Loglikelihood values at local maxima, seeds, and initial stage start numbers: 2571.979 181293 212 2571.979 688839 273 2571.979 830570 369 2571.979 unperturbed 0 snip 2694.316 350608 334 2694.316 478421 311 2696.832 967237 48 2696.832 247224 94 2696.832 471398 74 THE MODEL ESTIMATION TERMINATED NORMALLY 2) in using tech14 with the following values: lrtstarts = 40 10 2500 20; I get a warning indicating that 4 of 5 bootstraps failed to replicate. The pvalue for the BLRT is significant, but I assume the lack of replication is a significant problem. How should the lrtstarts be increased, and at what point would you determine that the solution can not replicate? 


You can use any one of the optseeds that result in the best loglikelihood. Version 5 will have improvements for TECH14. I would wait until it is out and use TECH11 in the meantime. 


Dear dr. Muthen, I would like to react on one of the messages above (July 20, 2006) about the interpretation of the BIC. Dr. Nagin describes in his book (Groupbased modeling of development, 2005) that the lowest absolute BICvalue is preferred, so not the closest to zero. So, what should be the right approach? I look forward to your reaction. Sincerely, Sylvana Robbers 


I think the post after that clarified the statement and said the lowest BIC. 


Maybe my question was not clear. I mean that dr. Nagin states that the BIC closest to zero is the best, and you propose the lowest BIC, which contradicts each other. So, is it the lowest BIC or the closest to zero? 


By the way, the sentence in my first message should be: Dr. Nagin describes in his book (Groupbased modeling of development, 2005) that the lowest absolute BICvalue is preferred, so the closest to zero. (without 'not'). Thanks in advance for your time. 


Let me see if I get this right. The Mplus BIC is BIC(M) = 2logL + r log n, where L is the likelihood, r is the number of parameters and n is the sample size. In Mplus we want the smallest BIC(M). Nagin in (4.1) of his book uses the alternative BIC(N) = logL  0.5 r log n, so that BIC(N) = 2 BIC(M). Nagin wants the largest BIC(N). With BIC(M) the second term is always positive (or nonnegative). The first term is typically positive as well because logL is typically negative. The term decreases as the likelihood increases (gets better). So here we want small positive BIC(M) values. In the rare cases when logL is positive (L > 1) the first term is negative and gets bigger negative as the likelihood increases. So here too do we want smaller BIC(M) where say 10 is smaller than 5 (10 is further to the left on the real line). I think this in line with my earlier post. 


Finally I understand it :) Thank you very much for your clear answer! 


Greetings, Could you have made a mistake ? I've been playing with BICs formulas and something does not add right... 2 * Bic(M)= 2 (2LogL + r log n) = 4 logL  2 r log n, which is not Bic(N). I believe it is the reverse:  2 Bic(N) = BIC(M) Indeed 2 (logL  0.5 r log n)= 2 logL + r log n. This does not change the conclusion for the positive versus negative BIC issues however. 


Is there a way to test the difference between two LCAs with the same number of classes, but in which one model includes a forced zerobehavior class (i.e., a forced nosubstance use class)? Specifically, I've determined a 4class solution fits the data best relative to 1,2,3, and 5class models. However the fit indices for a restricted and unrestricted 4class model are nearly equivalent: Adj BIC: 8138.76 (unrest) 8140.35 (rest) LMRLRT pvalue: .03 (unrest)<.001 (rest) Entropy: .83 (unrest) .83 (rest) My sense is that the BIC discrepancy is negligible (?) and that the restricted model should be selected based upon parsimony (i.e., fewer parameters estimated). The actual makeup of the 4classes is comparable (basically, one of the unrestricted model classes has a high number of nonuse individuals), but prevalence of the classes is different across the models, and predictors vary slightly. Any guidance  or a specific test that I might run to determine the "best" model? Christian 


I think I would go with the unrestricted model if it has the lower BIC, unless you have a specific theory for the existence of a zero class. I am not much for "model trimming", but just reporting the results even if some parameters may not be needed. But if you want to test for one of the classes being at zero probability for all item, perhaps you can use the Wald test of Model Test. For instance, you can define item j's probability as Model Constraint: pj = 1/(1+exp(tj)); where tj is the label for the threshold of item j. Then you use Model test: pj=0; You do this for all items at the same time. I haven't tried it, but I think it should work. 


Thank you for the response. A couple of followups: Is there any indication as to the "sensitivity" of the Adj. BIC  given these differences are less that 2 pts? Would these two 4class models be considered nested, since the same predictors and number of classes are specified  only one model includes constraints that force one class to include youth with no use? If so, is it appropriate to conducted a difference in chisquare (or equivalent) test to see whether the constraints significantly worsen fit? Also, any utility in examining the quality of classification table  even though entropy is identical, there is some variation both the diagonal and offdiagonal elements. 


There is a literature on how differences between BIC values should be viewed. See for instance Kass R. E. and Raftery, A. E. (1993). Bayes factors. Journal of the American Statistical Association 90, 773795 The models are nested, but the assumptions of the likelihoodratio chisquare test are not fulfilled because the zeroclass model specifies parameters that are on the border of their admissible space, namely zero item probabilities conditional on class. Come to think of it, the Wald test would be negatively affected by this too. The classification table might tell you about differences across models in being able to tell certain classes apart. 


I'm evaluating GMM solutions to identify the "best" number of classes for an outcome (total sleep time) measured over 14 occasions. There is clearly one large class, and some number of smaller classes, probably 2 or 3. I take it from Bengt's response in this thread on October 01, 2006  12:14 pm that specifying starting values is not necessary to get a valid BLRT test of the K vs K1 solutions with TECH14. I assume that the way the classes differ would therefore be based on the profile means for the two solutions? So, how to identify the K and K1 mean profiles that were identified in the BLRT? (I've found that the "K" BLRT solution is not identical to the model solution it follows.") 


Next question on selecting the "best" number of classes. I'm puzzled about the interpretation of the VLMR and BLRT tests for the K vs K1 solutions. I've obtained identical H0 LL values (and 2LL diff values) from TECH11 and TECH14 outputs in the same run, and VLMR will have a pvalue WAY larger than .05, while BLRT will have a pvalue below .0000. Further, after obtaining BLRT for a sequence of solutions with increasing classes (even specifying LRTBOOTSTRAP=100), I have found that it is always significant at .0000 even when the #classes is getting silly. I've also found that the BIC and SABIC also get smaller with additional classes, even for a large number of classes (5 or more, with most being very small), so not much help there. Tofighi & Enders (2008) recommended the BLRT and samplesize adjusted BIC as the most useful indices for GMM solutions, and Nylund et al. (2006 revised draft) also like the BLRT for GMM. Do you have further suggestions on the use of these indices vis a vis the # of classes and substantive interpretations in trying to find the "real" solution? Thanks! Bruce 


Still another question re the # of classes (my ignorance seems bottomless!)  How does one decide what values to specify for the LRTSTARTS command? I've read a couple of posts here about it, and the V5 manual (pp. 500501). The defaults are 0 0 20 5, and you suggest perhaps 2 1 50 15 as an example of a different specification. Why so few for the K1 class solution? Why that number for the Kclass solution? (The question arises partly because of the problematic data set you've helped me with, for which I needed to specify 1500 random starts to get a replicated maximum before going to the BLRT. So, I can use the OPTSEED option from one or more of the replicated solutions when I run the TECH14 analysis, but what issues dictate how to choose the number of draws for the bootstrapped K1 and K solution analysis?) Any references would be great, so I don't have to keep bothering you! Thanks! Bruce 


Belay that second message re the use of the BLRT resulting in pvalues < .0000. That happened with specified starting values and LRTBOOTSTRAP >= 100, but after running more models last night, and another TECH14 BLRT this morning using the default for holding random effects constant, and allowing the TECH14 to run on its own (no specs for LRTBOOTSTRAP or LRTSTARTS), I got a BLRT that made sense (in this case, not significant)! This was after specifying OPTSEED from an analysis I ran overnight, where I used 4000 random starts and got two (2!?!) replicated maximums for the LL. I also checked the K1 LL in the TECH14 output, and it was the same as the maximum from the 3class solution I had gotten previously with the same ANALYSIS settings. So there's some consistency! and a way to know what the K1 solution would look like from the TECH14 K1 model. I'm still troubled though, that out of 4000 random starts, I would get only two replications of the largest LL (3880.753), then just one LL that I had gotten multiple replications of in prior runs of this 4class analysis (3882.103), then 25 identical LL (3883.747), with the difference between the largest and smallest LL from these three solutions being only 2.994. Having only two out of 4000 replicated LL still seems pretty chancy, making me wonder about a local maximum that I just happened to hit by chance, twice. Thanks! Bruce 


Your 4 last posts touch on topics we teach at our 2day Mplus Short Courses. One just took place as the psychometric meeting and another one is coming up in November at Ann Arbor. This is in the area of Topic 5 and 6 (see our web site for topics and handouts). It is too large a topic to teach on Mplus Discussion  so I will just give some brief comments. Topic 5 has on slide 197: "More On LCA Testing Of K – 1 Versus K Classes Bootstrap Likelihood Ratio Test (LRT): TECH14 • LRT = 2*[logL(model 1) – logL(model2)], where model 2 is nested within model 1 • When testing a k1class model against a kclass model, the LRT does not have a chisquare distribution due to boundary conditions, but its distribution can be determined empirically by bootstrapping Bootstrap steps: 1. In the kclass run, estimate both the kclass and the k1class model to get the LRT value for the data 2. Generate (at most) 100 samples using the parameter estimates from the k1class model and for each generated sample get the log likelihood value for both the k1 and the kclass model to compute the LRT values for all generated samples 3. Get the p value for the data LRT by comparing its value to the distribution in 2." Because step 2 generates data according to the k1class model, the k1class model is easier to fit than the kclass model and therefore requires fewer starts. Having only 2 replicated best LLs out of 4000 is a sign of a problem  it typically indicates that the model tries to read too much out of the data. This happens when using too many parameters such as too many classes, particularly when the sample size is not large and the data signal is not strong. 


Thanks very much for the information, Bengt, and for confirming that getting only 2 maximum LL out of 4000 does indicate a problem with this model! I *have* been worried that I was trying to squeeze water from this stone! I saw an earlier announcement about the November short course and I have been planning to attend. Meanwhile, I see that your handouts are available to view online and will look through the ones for Topics 5 & 6. You used to sell the short course handouts through the Mplus website, but I couldn't find the page for ordering them. It's very generous of you to offer them for download now! I like to read as much as I can to find answers to my questions before bothering you folks, so I really appreciate the references you provide and the shortcourse handouts. Thank you. Best, Bruce 


I've run a 3class GMM to test a 3 vs 2class model with BLRT. I first ensured that I had a maximum LL that had many replications, then selected two seeds to check the solution. OPTSEED with both seeds produced identical solutions. I ran the model with one of the seeds using OPTSEED, first with no specifications for the TECH14 BLRT. I got a p = 0.3333 with 9 successful BS draws. Then, I ran the model with the same OPTSEED, but this time specifying K1STARTS = 100 20 ; LRTBOOTSTRAP = 50 ; LRTSTARTS = 5 2 20 10; to increase the reliability of the BLRT and the K1 solution. I get identical LL for the K model (same seed), and identical LL for the K1 model (and it is identical to the LL from the previous 2class model I ran without BLRT). However, the pvalue for the BLRT is now 0.0000 with 50 successful BS draws. This is the same type of result I referred to earlier, where the BLRT produces p = .0000 no matter how many classes are in the model, when specifying LRTBOOTSTRAP at some number, usually 50, 100, or 150 for the models I've run. Could you direct me to a reference that will help me understand this inconsistency in BLRT pvalues for the same LL and 2LL diff? Thanks, Bruce 


We don't have any further references. Please send your input, data, output, and license number to support@statmodel.com. 


Erratum: My posting on July 14, 2008  3:53 pm was wrong about the recommendation of Tofighi & Enders (2008) regarding the indices they recommended for choosing the number of classes in a GMM. I have found my own error in a Google(TM) search using their names, and hope this correction will also show up in future Google searches so folks won't be misled by my error. I wrote that Tofighi & Enders "recommended the BLRT and samplesize adjusted BIC as the most useful indices for GMM solutions" but my memory failed me, and I should have looked at the paper again before citing them. In fact, Tofighi & Enders (2006; 2008) did not evaluate the BLRT at all in their Monte Carlo study of GMM indices for choosing the number of classes, because it had just been added to Mplus when they did their study. Instead, their recommendation was that the samplesize adjusted BIC was best overall index in its performance across a number of conditions, and the LoMendellRubin LRT was next best in several situations. In fact, Nylund, Asparouhov, & Muthen (2006 revised draft; Structural Equation Modeling, 2007) recommended the BLRT for choosing the number of classes in GMM under some circumstances, but their Monte Carlo study did not evaulate very many conditions for GMM. Sorry for the error! Bruce 


Linda  Thanks for your note on July 16th and the followup tech spt via email. My question is "wheretogofromherefornow" in using the BLRT to help choose the number of classes for GMM. I understand that it is best to set the largest class as the last class when using the BLRT, but (from another posting) that the order of the other classes is not essential. However, I have been unsuccessful in making the largest class the last class, whether I use class intercepts from the solution being tested as starting values, or whether I use the categorical latent variable means. With either method, the program still reorders the classes so that the largest class is not last in the TECH14 run, defeating the purpose of the BLRT to some extent, and taking a lot of time to run repeated bootstrap tests trying to make the last class the largest. Using the categorical latent variable means, for example, I got this from the prior solution for the 4class model being tested, in which the last class WAS largest: Categorical Latent Variables Means C#1 1.146 C#2 2.576 C#3 2.860 So, I specified in the Model %OVERALL% statement for the TECH14 run: [ c#1*1.146 c#2*2.576 c#3*2.860 ]; But no matter how I ordered those three values, the subsequent runs put the largest class as number 2 or 3. Any thoughts about what I'm doing wrong? Thanks! Bruce 


Please send your files and license number to support@statmodel.com. 


Dear Linda and Bengt, I’m running a lot of LCA and SEM analysis with and without factors and covariates. A few questions (hope they are not too "simple"): 1) A suppose that the dot in scale correction factor is a decimal identifier. 2) The simplified SatoraBentler use of scale correction factors in the Difference testing I suppose that the number of parameters are the number of free parameters. 3) When I incorporate covariates in an analysis the number of subjects may differ considerably because of exclusion of missing values for the covariates. Concomitantly, the LL and the statistics (AIC etc) change considerably, say with 2100 subjects the BICadjusted may be 31,000 and when incorporating a covariate only 1700 subjects are included with a BICadjusted being 25,000. The correction for sample size may not seem appropriate. Is it possible to compare the two models by calculating F in chisquare (n1)*F? Or alternatively correcting e.g. BIC by correcting for sample size (e.g. using the above numbers BICcorr = (25,000/1700)*2100 = 30,882)? An alternative way to do comparisons is by using the USEOBSERVATION option to only include a full data set. Its easy to do, but becomes tedious as more covariates are included. 4) How do you calculate DF in a mixture model? 5) Is there any rules for using the entropy meassure in decision of best fit? Best, 


1. Yes. 2. Yes. 3. Only models with the same set of observed variables and the same set of observations can be compared. 4. Degrees of freedom are relevant only for models where means, variances, and covariances are sufficient statistics for model estimation. In other cases, the number of free parameters is used. 5. Entropy is not a fit statistic. 


Thanks Linda, This cleared up a few things for me. A few follow up questions: If we in a model with the same set of obersvations replace one covariate with another so the number of paramters and subjects are the same, shoudln't it be possible to compare the two models? If the first model gives a BIC say 31000 and the second 30000, wouldn't you conclude that the second model is a better model and should be preferred? Entropi: although entropi is not a fit statisitcs, is there any formal way to conclude that an entropy of 0,7 is worse than 0.8 (e.g. in the example above), and would you be able to include such a result in your decission of which model to choose? Best. 


You cannot change the set of observed variables if you want to compare models. Entropy ranges from zero to one with the higher value being the better value as far as classification is concerned. 


Hello! I have a question concerning tech14. My LRTvalue in the real data is dramatically different as compared to the LRTvalue in the simulated data sets, which I monitored in the tech8 window (e. g. 124 vs. 66). Should I alter the settings of lrtstarts, or is this not really a problem? In addition the pvalues of VMLR and tech14 dramatically differ too (p = .95 vs. p < .001). The real data and generated data H0LL's (also H1LLs) in tech14 are the same, as described in the manual. So I think it's a problem with the generated data sets and their H1 and H0LLs (lrtstarts)!? 


I would need to see your outputs to answer this question. Please send them and your license number to support@statmodel.com. 


Ok, it takes some days to finally compute the tech14s then I would send it. But I guess it's rather hard to find a hint on only the outputs because BLRT LRTvalues of generated data sets are available only on the tech8 window (disappearing after computation). The following sentence in my last post was nonsense: 'The real data and generated data H0LL's (also H1LLs) in tech14 are the same, as described in the manual.'> I only wanted to say, that I reproduced the H0 and H1 LLs of 'real data k1 run' in my tech 14 run, probably pointing to the fact that something is wrong with bootrapping of the generated data set. But this is a guess based on the phenomena I described regarding the LLs in the tech8 window. thank you and so long, michael 


Dear Linda and Bengt, In a SEM mixture model can you compare (e.g. using BIC) two models in which one model treats an indicator as continuous and the second model treats the indicator as ordinal? Best, Mogens 


No, that gives different likelihood scales. 


Greetings, In mixture models (especially with large samples) it sometimes happens that the examined fit indices (CAIC, BIC, aBIC, etc) keep on decreasing while additional classes are added, potentially because of their sensitivity to sample size. In most cases when this happens, the additional classes don't necessarily make sense (susbstantively or statistically: very small classes, classes that only represent a meaningless division of preceding classes, etc.). In those cases, to choose the number of classes one is left with theory and subjectivity. It seems to me that in such cases the fit indices (CAIC, BIC, aBIC, etc) associated with varying number of classes might be depicted graphically and interpreted as an EFA scree test to help in the determination of the correct number of classes. So, 1) Do you have any misgiving about this method ? 3) Do you know of any references of a paper either suggesting the use of this method (scree test) or using this method to choose the correct number of classes)? Thank you very much. 


1) No 2) No However, you need to always be open to the possibility that you are fishing in the wrong pond  the model family you are in may not be the best for the data and if you switch model family you might find a minimum BIC. For example switching from LCGA to GMM. 


Hi Linda and Bengt, Can you tell me what the difference is in Tech11 between the VLMR and the adjusted LMR? Also, you recommend that the last class is the largest class. However, you also recommended that model identifying restrictions not be included in the first class. Would you consider starting values for the first class to be model identifying restrictions? I am currently using starting values for the first class to make it the smallest class, thereby making sure that the first class for Tech11 is not the largest class. Thanks, Matt 


The authors provided a posthoc adjustment. You can see the original article for the details. We do not use the adjusted LMR. Starting values are not model identifying restrictions. These would be some restrictions on model parameters. I would use starting values to make the last class the largest class not the first class the smallest class. 

KengHan Lin posted on Thursday, February 18, 2010  1:28 pm



Hi Linda and Bengt, I'm using LCA on a complex survey data with 48 indicators (13 of them are continuous). The BIC suggests 6class model, 46213.84(2class model) 45634.95(3) 45419.72(4) 45322.00(5) 45195.95(6) 45290.84(7), but we are wondering if there's other rule we could follow to better determine the number of class. BLRT(tech 14) doesn't support for complex data. The results (pvalue )of LMR test are as following, 0.047(2class model) 0.373(3) 0.582(4) 0.591(5) 1.000(6) 1.000(7). In 3class model which has LMR pvalue of 0.373, does it suggest 2class(H0) is good enough in our case? If so, which statistics should I depend on? Or other criteria I should take into account? Thank you so much for your help. 


Perhaps the following paper which is available in the website can help: Nylund, K.L., Asparouhov, T., & Muthen, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535569. You should also consider whether the classes have any substantive or theoretical basis. 


Hello, I am investigating trajectories of body image over time, 13 to 30 years. The sample in GMM is 1082. I have problems deciding number of classes (we are several here who are puzzled). 3 classes; BIC the lowest, but LMRLRT 0.079, with 4 classes it is 0.038). Entropy for 3 classes is 0.727 and for 4 classes 0.734 (not very high!). However, the 4 class solution has one class which is only 1.5%, that's 15 persons, don't make much sense. I have learned to trust the LMRLRT, but find it hard to proceed with 4 classes due to the sample size in each class. What is your opinion, also regarding the relatively low entrophy? Best regards, Ingrid 


Adding to my questions above... The Bootstrap test is significant for both a 3 and 4 class solution. The LMRLRT value above is of course the pvalue. A collegue suggested to add q@0 to my modelcommand, then the 3 class solution performed somewhat better. The LMRLRT (p) 0.024 ( 4 class (p) 0.21 ). However the Entropy is around 0.70 (a bit lower for both a 3 and 4 class solution). Thanks for your help. Ingrid 


The significance of LMRLRT should be interpreted only the first time it is greater than .05 not after that. Your first post suggests two or three classes. The meaningfulness of the classes should determine your choice. 


Hi, a have a question about selecting the number of classes in LCA. For any reasonable number of classes Tech11 and Tech 14 show zero pvalues. Also BIC decreases (for 3 or more classes) rather slowly. Does this mean LCA does not fit the data and should not be used? On the other hand, 4 and 5 class solutions have high entropy (around 0.9) and(given my research goals) seem to make sense. Is it ok to use LCA and choose a solution that makes sense with respect to my research goals? Thanks for your help, Tomas 


This may point to the need for a different type of model, for example, a factor mixture model. If your indicators are categorical, TECH10 might help you see what the problem is. 


I am running a Latent Profile Analysis with 5 continuous variables. My sample size is 430. I have been able to replicate the loglikelihood values for the 1 and 2 class solutions. I have also been using the steps outlined by Asparouhov & Muthén webnote 14 to test the number of latent classes using the BLRT from TECH 14 using the OPTSEED option. Based on those recommendations I get the warning THE BEST LOGLIKELIHOOD WAS NOT REPLICATED. I continue to get the message even after increasing LRTSTARTS to 50 20 100 20 and using LRTBOOTSTRAP from 100 through 500. In all of these, the loglikelihood for the k1 class is the correct one from step 1. 


Please send input, output, data and license number to support@statmodel.com. Send the outputs from all the steps recommended in Web Note 14. 


Hi Linda and Bengt, In my LPA and FMA analyses, the BLRT remains significant (at .0000) for every analysis. My samples are n = 533 and n = 181, so I'm not sure it's a result of too much power. Shaunna Clark thought she had read cases of this, and suggested relying on ICs and substantive interpretation of the models (and the other LRTs, which do reach nonsignificance). I was wondering whether you know of any citations to justify this kind of decision? Thanks and all the best, Miriam 


I would use BIC and substantive interpretation. The Nylund et al paper also shows that BIC is one of the top indices (second best?). Notwithstanding the Nylund et al article, my experience is that BLRT is less dependable in practice for some reason. 


Thanks, Bengt! I'll emphasise the Nyuland et al. paper. All the best, Miriam 

hazel liao posted on Wednesday, June 10, 2015  5:08 am



As far as I know, BIC decreases when the model has less free parameter which means the model is parsimony and we should choose the lowest BIC after compared models. However, when I compared models ( 2group with 16 free parameters, 3group with 22 free parameters, 4groups with 28 free parameters), BIC/ adjusted BIC decreased when group number increased. Is this output reasonable? 


BIC depends not only on the number of parameters but also on the loglikelihood. If the loglikelihood increases faster then the penalty for the number of parameters, BIC will decrease. 

hazel liao posted on Wednesday, June 10, 2015  7:59 pm



You are right, loglikelihood increases faster in my example.Thank you very much!! By the way, I know we should minimize fit function, but what relation between the loglikelihood and Fit function? 


Hi, I read 'Using Mplus TECH 11 and TECH14 to test the number of latent class' by Asparouhov & Muthen. I presume that it is appropriate to use AUXILIARY(bch) command in these tests since I do have a distal outcome variable that was predicted by the resulting classes. It does make sense to me to include this common as done in the automatic 3 step approach. Thanks in advance for your help. Sabrina 

Jon Heron posted on Tuesday, October 27, 2015  3:13 am



it's pretty much standard practice to determine the number of classes using an unconditional mixture model without either covariates or outcomes. In theory you might, in a GMM, have a covariate which explains some intercept variance and hence reduces the number of classes but I've only ever seen this done once in a publication. 


TECH 14 is indicating that with 1000 random starts I am only getting 2 successful bootstrap draws for 3 vs 4 classes. While the LRT values make sense at 4 classes (p=< .01), the BLRT pvalue is 1.0. I checked for a local solution by testing if the 4 class model parameter estimates were replicated across two different models with seed values from the two best loglikelihood values, and I got replicated parameters. We have specified freed time parameters given the likelihood of nonlinear time (looking at depression before and after childbirth among a bereaved sample). Our sample size is >10,000. Based on an earlier post, it seems that only 2 successful bootstrap draws with 1000 random starts could be a issue. What could be causing the failure to replicate no more than 2LLs? Is the BLRT pointing to a problem in our model specification? 


Regarding your second sentence, there is no LRT for mixtures unless you are looking at a frequency table. Regarding BLRT problems, see our web note 14. 


depending on AIC and BIC, i found that 4 classes are suitable for my latent class variable in spite of depending on the entropy, i found that 2 classes are suitable. how can i take the correct decision? 


Q1: Yes. 1. See Topic 5's treatment of FMM. 2. See the paper on our website: Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169177, DOI: 10.1080/10705511.2014.935844. Download Mplus files. For these general analysis strategy questions you want to use SEMNET. 


Q1: Yes. 1. See Topic 5's treatment of FMM. 2. See the paper on our website: Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169177, DOI: 10.1080/10705511.2014.935844. Download Mplus files. For these general analysis strategy questions you want to use SEMNET. 


Dear Drs. Muthen, I have a question about assessing the stability of classes, specifically stability of individuals within class in a GMM. Is the bootstrap ever used to replicate and compare resulting class membership once the best fitting number of classes has been established? If so, how would I do this? Many thanks. 


I am working on trying to create a typology (the literature suggests three types) but all of the analyses, regardless of the model, suggests a oneclass model. To arrive at this conclusion, I focused on interpreting the LMR test, which for all classes (i.e., 2, 3, etc.) has a p value much greater than 0.05 (e.g., 0.2, 0. 5, etc.). Is this the correct interpretation, or should I look to other indicators (e.g., BIC) or to the final class counts (or to something else), which might suggest a different number of classes? 


Just use BIC. 

Kathy Xiao posted on Monday, August 21, 2017  2:55 pm



I have around 14 binary and categorical variables for LCA, but starting from 2 classes, the pvalue of LMR is nonsignificant, and it never becomes significant. May I know any explanation of this situation? Thanks very much! 


You may have only 1 class. Check BIC. 

Kathy Xiao posted on Monday, August 21, 2017  5:02 pm



Thanks for your reply. The BIC and Adjusted BIC kept decreasing from 1 class and more. So I am not sure which class to select. The entropy increases till 4class model and then decreases. 


I would go with BIC and when, as here, it doesn't help, I would change the model, e.g. introduce residual correlations or a factor. See our handouts and video for Topic 5, plus the paper on our website: Asparouhov, T. & Muthen, B. (2015). Residual associations in latent class and latent transition analysis. Structural Equation Modeling: A Multidisciplinary Journal, 22:2, 169177, DOI: 10.1080/10705511.2014.935844. Download Mplus files. 

Kathy Xiao posted on Monday, August 21, 2017  5:30 pm



What if the BIC keeps decreasing to 8 classes and it is substantively not meaningful? If I don't change the model, can I select the model by (1) largest entropy; (2) relatively small BIC; (3) substantive interpretation when the pvalue of LMR are all nonsignificant? Thanks for answering and sharing the article! 


I would not use entropy as the major factor in deciding on the number of classes. That's like choosing a SEM model based on Rsquare instead of model fit. Read what I suggested instead. 

Daniel Lee posted on Sunday, December 24, 2017  9:26 pm



Hi Dr. Muthen, In my Latent Profile Analysis, the AIC, BIC and adjusted BIC were lower (>100) in the 3 class model, but the LRT tests were not significant. I am not sure what to make of these results. Does it mean I should proceed to examine a 4class LPA model? Or, should I just accept the 2class LPA model (even though the BIC, AIC, and adjusted BIC decrease by over 100 units)? Thank you for your help. 


Go by BIC. 

Emily Gagen posted on Saturday, January 20, 2018  12:31 pm



Hi Dr. Muthen, In my latent class analysis, the AIC and ssaBIC decreased with each increase in number of classes, but the BIC increased at each step. Entropy increased up until the 4 class model, and decreased with 5 classes. The 3 class model is most consistent with the literature and is the most interpretable. I'm mostly wondering whether I should be concerned that the BIC value was increasing with each successive model. Thank you for your help! 


You should indeed be concerned that BIC doesn't decrease with increasing number of factors. If BIC is worse for 2 classes compared to 1 class (nonmixture), I would stop and not use a mixture. Perhaps your sample size is extreme. 

Emily Gagen posted on Saturday, January 20, 2018  6:00 pm



I apologize, I wasn't clear  the BIC decreased (improved) for 2 classes compared to 1 class, but then increased again for each subsequent model (35). Separately, the AIC and ssa BIC decreased each time. Is this an indication that I should choose a 2 class model rather than 3? For reference, my sample size is 134  I'm not sure what you mean by extreme. Thank you for your help. 


I would choose 2 classes. 


Hello, I am having a related problem to Kathy, but it is particularly an issue related to trying to account for nesting in my data. I have a Latent Profile Analysis with 6 indicators. I have 103 teachers who are in 9 schools. I am trying to account for nonindependence using type = complex, cluster is schoolid. My BIC and AIC continuously decrease as I add a profile. When I don't account for clustering, the LMR LRT suggests a 3profile model (I include one covariance between two indicators, constrained to be equal across profiles). When I do account for clustering, I get nonsignificant LRTs for all profiles. However, the estimates of the indicators for each profile do not change regardless of whether clustering is accounted for or not. I am wondering about the appropriateness of the clustering in this situation (is 9 clusters too small) and why we might be seeing this issue with fit even if the results don't change? Can I trust the results of the nonclustered LPA if the results are identical when we account for clustering? Thank you! 


I should add that I tried adding covariances and a factor to the model with clustering and it did not help with distinguishing between the classes using the LMR LRT. 


9 schools is too few for Complex (and Twolevel)  you need at least 20. Instead, use 8 dummy variables as covariates to represent the schools. 


Thank you! Is the case only for LPA/LCA or for all models? 


For all models. 

Back to top 