I am carrying out a LCA with 6 categorical variables (n ~ 1,300). I am using BIC, entropy, LMR(tech 11), and BLRT(tech14). I've read the Nylund et al. (2007) simulation paper and reviewed tech11 and tech14 sections in the User's manual.
As others have noted in various Mplus discussion threads, the p-values for the LMR and BLRT diverge dramatically... 3 class - LMR .000, BLRT .000 4 class - LMR .8835, BLRT .000 5 class - LMR 1.00, BLRT .000 ... 7 class - LMR 1.00, BLRT .000
Also the LL's for the k and k-1 class models are replicated in the BLRT's. For all baseline models the lowest LL's were replicated 5+ times. By manipulating starting values the largest class is identified last in the BLRT and LMR tests. Also, for BLRT test bootstrap draws were set at 100 and LRT starts was set at 0 0 500 75 so that the best LL was replicated in each of the 100 bootstrap draws. So, these tests have been carried out carefully and follow guidelines as set out in the user's manual.
What might be accounting for the diverging p-values for the LMR and BLRT tests? Do you have suggestions for moving forward?
I assume that the 2-class model has 0 p values for both as well.
BLRT and LMR agree well in the Nylund et al simulations, but we have now seen many real-data examples where they don't and where BLRT seems to often reject even for a high number of classes. More research into this phenomenon is needed.
In these instances of disagreement between BLRT and LMR I would rely more on BIC which also performed well in the Nylund et al study unless sample size is small. I would also rely on interpretability. I usually work with BIC as a first step and then I may use BLRT and LMR to help choose between the key number of classes. I wonder if BIC shows a distinct minimum in your case. If not, this could be an indication that the model type is not suitable for the data - for example a factor model or a factor mixture model may be more suitable than LCA.
Randy Mowes posted on Tuesday, February 21, 2017 - 2:19 am
I have a similar problem. I am performing LPA with 27 continuous variables with data from ~1000 participants. The LMR suggests a 3 cluster solution with: 2 class p= .0000 3 class p= .0000 4 class p= .7395 5 class p= .3557 6 class p= .5026 7 class p= .2826
The BLRT is .0000 in all cases. Additionally, the BIC and adjustedBIC values continue to decrease with up to 15 clusters. So the LMR indicates a 3 class solution, but both the BLRT and the information criterion based model fit indices suggest a much higher number of clusters.
What might cause these differences in model fit indices and how to deal with this?
I would go by BIC in these cases. The problem of BIC not showing a minimum can be solved by allowing within-class covariance between some pairs of variables. Sometimes adding a single factor to soak up some such covariance can give an indication of which pairs of items are the culprits.
Randy Mowes posted on Tuesday, February 28, 2017 - 6:55 am
Dear Mr. Muthen,
thank you for your fast response! If I understand you correctly you say that the default LPA in Mplus does not allow within-class covariance and in order to solve that I should look at the correlation between variables to see which ones might have high levels of covariance and allow for in-class covariance for these variables. Did I understand that correctly? And if so: How do I adapt my syntax to do so? This is an excerpt of my syntax:
I am doing an LCA with 7 continuous indicators and comparing BIC, LMR and BLRT. I find that the BIC keeps decreasing continuously, and the BLRT stays sig. for models with up to 10 classes. The LMR, however, becomes insignificant for the first time in a model with 7 classes. Should I trust the LMR? The 6 class model is substantively meaningful, but not more or less than a 5 or 7 class model.
I have tried to add one latent factor in order to see which residual covariances are large. However I'm not sure how to interpret the output. Where can I see which variables have large loadings on the factor? And how do I translate these into residual correlations?
I have noticed that the BIC has a minimum for a three class-1 factor solution. Would you recommend to keep the factor mixture model instead of the LCA in this case?
Hi, I am running into a similar situation where an interpretable class solution (3 classes) with the least BIC value displays a non-significant VUONG-LO-MENDELL-RUBIN LRT compare to a 2 class solution (all BLRTs for all solutions are signficant). I have tried the previous recommendation of adding residual covarainces (which makes sense theoretically) for two of the indicatirs (they are all binary, labled G1 to G8). However, adding the line "G2 with G3" stops the software's run. How do I allow the covariance in my model (please see below)?
Thank you, Parameterization = Rescov allowed covarying two of the binay indicators that are likely to covary contentwise. Still non-bootstrap LRTs remain nonsignficant between C(3) with least BIC and C(2) with larger BIC. As mentionedn the parametric bootstrap LRT reject the H0 and can be seen as supporting BIC, but again bootstrap LRTs remain significant all through increasing class numbers (4, and 5, with worse BIC).
Would you say that looking at BIC and substantive intrepretation first (selecting C3) and discarding non-bottstrap LRT and relying on bootstrap LRT in this case would make sense? Or is it something fishy about the ever signficant bootstrap LRT?
BICs for FMA: 2c1f 18497.87 3c1f 18492.43 4c1f 18494.78
For the FMA, the BIC becomes minimal for the 3c solution. For the LCA, the BIC keeps decreasing. However, the BIC for the, for example, 6c LCA (which makes substantive sense) is smaller than the BIC for the FMAs. Which should I prioritize: the BIC finding any kind of minimum (as in the FMA model) or the BIC being in general better for the LCA?