Message/Author 


I am carrying out a LCA with 6 categorical variables (n ~ 1,300). I am using BIC, entropy, LMR(tech 11), and BLRT(tech14). I've read the Nylund et al. (2007) simulation paper and reviewed tech11 and tech14 sections in the User's manual. As others have noted in various Mplus discussion threads, the pvalues for the LMR and BLRT diverge dramatically... 3 class  LMR .000, BLRT .000 4 class  LMR .8835, BLRT .000 5 class  LMR 1.00, BLRT .000 ... 7 class  LMR 1.00, BLRT .000 Also the LL's for the k and k1 class models are replicated in the BLRT's. For all baseline models the lowest LL's were replicated 5+ times. By manipulating starting values the largest class is identified last in the BLRT and LMR tests. Also, for BLRT test bootstrap draws were set at 100 and LRT starts was set at 0 0 500 75 so that the best LL was replicated in each of the 100 bootstrap draws. So, these tests have been carried out carefully and follow guidelines as set out in the user's manual. What might be accounting for the diverging pvalues for the LMR and BLRT tests? Do you have suggestions for moving forward? 


I assume that the 2class model has 0 p values for both as well. BLRT and LMR agree well in the Nylund et al simulations, but we have now seen many realdata examples where they don't and where BLRT seems to often reject even for a high number of classes. More research into this phenomenon is needed. In these instances of disagreement between BLRT and LMR I would rely more on BIC which also performed well in the Nylund et al study unless sample size is small. I would also rely on interpretability. I usually work with BIC as a first step and then I may use BLRT and LMR to help choose between the key number of classes. I wonder if BIC shows a distinct minimum in your case. If not, this could be an indication that the model type is not suitable for the data  for example a factor model or a factor mixture model may be more suitable than LCA. 

Randy Mowes posted on Tuesday, February 21, 2017  2:19 am



Hello, I have a similar problem. I am performing LPA with 27 continuous variables with data from ~1000 participants. The LMR suggests a 3 cluster solution with: 2 class p= .0000 3 class p= .0000 4 class p= .7395 5 class p= .3557 6 class p= .5026 7 class p= .2826 The BLRT is .0000 in all cases. Additionally, the BIC and adjustedBIC values continue to decrease with up to 15 clusters. So the LMR indicates a 3 class solution, but both the BLRT and the information criterion based model fit indices suggest a much higher number of clusters. What might cause these differences in model fit indices and how to deal with this? 


I would go by BIC in these cases. The problem of BIC not showing a minimum can be solved by allowing withinclass covariance between some pairs of variables. Sometimes adding a single factor to soak up some such covariance can give an indication of which pairs of items are the culprits. 

Randy Mowes posted on Tuesday, February 28, 2017  6:55 am



Dear Mr. Muthen, thank you for your fast response! If I understand you correctly you say that the default LPA in Mplus does not allow withinclass covariance and in order to solve that I should look at the correlation between variables to see which ones might have high levels of covariance and allow for inclass covariance for these variables. Did I understand that correctly? And if so: How do I adapt my syntax to do so? This is an excerpt of my syntax: TITLE: LPA met studenten Data: File = zscores studenten.dat ; Variable: Names = idcode ZS_P_cons ZS_P_extr ZS_P_open ZS_P_emo ZS_P_hon ZS_P_agr; USEVAR = ZS_P_cons ZS_P_extr ZS_P_open ZS_P_emo ZS_P_hon ZS_P_agr; classes = c(6); Analysis: Type=mixture; Output: TECH1 TECH8 TECH14; Thank you for your help! 


Hello all, I am doing an LCA with 7 continuous indicators and comparing BIC, LMR and BLRT. I find that the BIC keeps decreasing continuously, and the BLRT stays sig. for models with up to 10 classes. The LMR, however, becomes insignificant for the first time in a model with 7 classes. Should I trust the LMR? The 6 class model is substantively meaningful, but not more or less than a 5 or 7 class model. Thank you very much in advance. 


It's tough when BIC doesn't agree with LMR etc. You could try adding some residual covariances among your latent class indicators and that may make BIC have a minimum. 


Thank you! I have tried to add one latent factor in order to see which residual covariances are large. However I'm not sure how to interpret the output. Where can I see which variables have large loadings on the factor? And how do I translate these into residual correlations? I have noticed that the BIC has a minimum for a three class1 factor solution. Would you recommend to keep the factor mixture model instead of the LCA in this case? 


Q1: Look at the significance of the factor loadings. For which variables does that occur? They are likely to be involved in significant residual covariances (in a model without the factor). Q2: If all of the factor loadings are significant and the BIC value is better than for any other model, that is one way to proceed. 


Would I then compare the BIC from the 3class1factorFMA with the BIC from the 3classLCA? Or would I compare it with the LCA model where the BIC is minimal (supposed there is one)? 


Compare the BIC from the 3class1factorFMA with all other models. 

fred posted on Thursday, April 04, 2019  2:12 am



Hi, I am running into a similar situation where an interpretable class solution (3 classes) with the least BIC value displays a nonsignificant VUONGLOMENDELLRUBIN LRT compare to a 2 class solution (all BLRTs for all solutions are signficant). I have tried the previous recommendation of adding residual covarainces (which makes sense theoretically) for two of the indicatirs (they are all binary, labled G1 to G8). However, adding the line "G2 with G3" stops the software's run. How do I allow the covariance in my model (please see below)? ANALYSIS: TYPE = MIXTURE; STARTS = 600 120; LRTSTARTS = 0 0 500 200; OPTSEED = 991329; OUTPUT: TECH1 TECH11 TECH14; 


If your outcomes are categorical, you can use Parameterization = Rescov to be able to use WITH. If this doesn't help, send your output to Support along with your license number. 

fred posted on Friday, April 05, 2019  12:14 am



Thank you, Parameterization = Rescov allowed covarying two of the binay indicators that are likely to covary contentwise. Still nonbootstrap LRTs remain nonsignficant between C(3) with least BIC and C(2) with larger BIC. As mentionedn the parametric bootstrap LRT reject the H0 and can be seen as supporting BIC, but again bootstrap LRTs remain significant all through increasing class numbers (4, and 5, with worse BIC). Would you say that looking at BIC and substantive intrepretation first (selecting C3) and discarding nonbottstrap LRT and relying on bootstrap LRT in this case would make sense? Or is it something fishy about the ever signficant bootstrap LRT? Thanks 


Q1: Yes. Q2: No. 


I have trouble deciding which model to choose between a 6classLCA model and a 3class1factor FMA. BICs for LCA: 2c 19052.63 3c 18362.43 4c 18169.57 5c 17975.71 6c 17848.04 7c 17763.71 8c 17698.38 9c 17683.41 BICs for FMA: 2c1f 18497.87 3c1f 18492.43 4c1f 18494.78 For the FMA, the BIC becomes minimal for the 3c solution. For the LCA, the BIC keeps decreasing. However, the BIC for the, for example, 6c LCA (which makes substantive sense) is smaller than the BIC for the FMAs. Which should I prioritize: the BIC finding any kind of minimum (as in the FMA model) or the BIC being in general better for the LCA? Thank you! 


You don't want to use a model which has a worse BIC than another model. If the outcomes are categorical, you may want to consider the RESCOV option in order to find a BIC minimum. 


Thank you. Unfortunately, the outcomes are continuous. When comparing differen kind of FMMs, I am also having the problem that many models fail to find a single best loglikelihood  even with STARTS= 2000 200. Could this maybe be a sign that these models are too complex for my data? Or should I try increasting STARTS even more? Thank you very much. 


Instead of FMM, you can try all WITHs as in UG ex 7.22. In general, I think difficulty of replicating the best logL is a sign that the model tries to extract too much information from the data, so yes, the model is too complex. 

Back to top 