I have been finding lately, across a number of large n data sets, that LMR LRT tends to be more conservative than BIC (and/or sample size adjusted BIC) when it comes to selecting a number of classes for mixture modeling (the bootstrapped LRT variant is typically not available to me as I am typically working with complex sampling). Most recently, LMR LRT supported a 3 class solution, whereas BIC kept returning lower and lower values up to a 7 class model (and an 8 class model wasn't compared simply because it wouldn't estimate).
I was wondering if you were aware of any literature and/or intuitions that could explain these findings? That is, is there support for BIC "overextracting" classes relative to LMR LRT in large samples (e.g., n = 3k)? Because I'm seeing it all over the place!
See the following paper which is available on the website:
Nylund, K.L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.
Lewina Lee posted on Monday, September 17, 2012 - 9:36 am
Dear Drs. Muthen,
I am conducting an LPA of 21 continuous outcomes on N=1189. In class enumeration, the LMR & VLMR tests (TECH 11) suggested that a 4-class solution best fits the data; but BIC keeps decreasing and the p-value from bootstrap LRT (TECH 14) keeps increasing. My questions are:
1. Nylund et al. (2007) reported BIC, LMR & BLRT as performing equally well (all identifying the correct model >94% time) in LPAs. Given my discrepant results across the indices, what is the best practice approach?
2. At k=8, the best LL was only replicated 3 times with STARTS = 1600 100. Is this a sign of poor model fit (and therefore a good place to stop increasing k)?
3. I followed the steps in MPlus Web Notes #14 in requesting TECH11 & TECH14. When I freed variances across classes and requested output=TECH14, with the addition of "LRTBOOTSTRAP=120", the best LL was not replicated in 88/120 draws -- how many times does the best LL need to be replicated for the results to be trustworthy?
Once TECH11 or TECH13 obtain a p-value greater than .05, you should not look further. It sounds like you are trying to extract too many classes. You should use theory to help you decide on the number of classes. The loglikelihood should be replicated several times when you are extracting more than two classes.
Lewina Lee posted on Tuesday, September 18, 2012 - 1:52 pm
Thanks, Linda. I assume you meant TECH14 (not TECH13)? Please correct me if I'm wrong.
In general, when running TECH14, at what threshold (% or # of bootstrap draws?) does M+ output a warning about the best LL not being replicated over X draws? Does the best LL need to be replicated across all bootstrap draws for the resulting p-value to be trustworthy?
The warning comes out if more than 50% of the draws have the problem. The best LL should be replicated for a trustworthy p-value. If you need to, you can use the LRTBOOTSTRAP command to increase the number of random starts used in tech14.
Lewina Lee posted on Saturday, September 22, 2012 - 4:53 pm
Thank you for the information, Tihomir!
I have one more question about the TECH14 output -- what does "Successful Bootstrap Draws" mean?
These are in general the number of bootstrap draws used to compute the P-value and that number is not a constant. If the p-value is near 5% more draws are used. In case some convergence problems occur for some bootstrap draws, you will also get a warning about that. If you don't get any convergence warnings then Successful Bootstrap Draws are all the Bootstrap Draws.
Lewina Lee posted on Tuesday, September 25, 2012 - 1:15 pm
Thank you for your explanation, Tihomir!
sandy z posted on Thursday, August 18, 2016 - 5:29 am
Dear Dr. Muthén,
I have a question about number of initial stage random starts and LMR test (TECH11).
I am conducting an LCA on the basis of 11 polytomous items on N=3489. I requested 500 sets of random starts (STARTS = 500 100; STITERATIONS=10) in a 4-class model, then, the logliklihood value has been successfully replicated (100 times), the LMR p value was 0 and suggested that a 4-class model fits better than a 3-class model. However, when the number of random starts has increased to 3000, the LMR p value was not significant anymore, even though the same Log Likelihood and the same aBIC have obtained.
Do you know why this discrepancy on LMR occurs when different number of random starts were requested? Thank you!
sandy z posted on Thursday, August 18, 2016 - 5:34 am
TECH11 OUTPUT (STARTS = 500 100) Random Starts Specifications for the k-1 Class Analysis Model Number of initial stage random starts 500 Number of final stage optimizations 100 VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 3 (H0) VERSUS 4 CLASSES H0 Loglikelihood Value -42312.733 2 Times the Loglikelihood Difference 2570.721 Difference in the Number of Parameters 34 Mean 160.729 Standard Deviation 331.756 P-Value 0.0000 ---------------------- Random Starts Specifications for the k-1 Class Analysis Model Number of initial stage random starts 3000 Number of final stage optimizations 100 VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 3 (H0) VERSUS 4 CLASSES H0 Loglikelihood Value -42312.733 2 Times the Loglikelihood Difference 2570.721 Difference in the Number of Parameters 33 Mean 13961.774 Standard Deviation 19437.308 P-Value 0.7211