I have been finding lately, across a number of large n data sets, that LMR LRT tends to be more conservative than BIC (and/or sample size adjusted BIC) when it comes to selecting a number of classes for mixture modeling (the bootstrapped LRT variant is typically not available to me as I am typically working with complex sampling). Most recently, LMR LRT supported a 3 class solution, whereas BIC kept returning lower and lower values up to a 7 class model (and an 8 class model wasn't compared simply because it wouldn't estimate).
I was wondering if you were aware of any literature and/or intuitions that could explain these findings? That is, is there support for BIC "overextracting" classes relative to LMR LRT in large samples (e.g., n = 3k)? Because I'm seeing it all over the place!
See the following paper which is available on the website:
Nylund, K.L., Asparouhov, T., & Muthén, B. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling. A Monte Carlo simulation study. Structural Equation Modeling, 14, 535-569.
Lewina Lee posted on Monday, September 17, 2012 - 9:36 am
Dear Drs. Muthen,
I am conducting an LPA of 21 continuous outcomes on N=1189. In class enumeration, the LMR & VLMR tests (TECH 11) suggested that a 4-class solution best fits the data; but BIC keeps decreasing and the p-value from bootstrap LRT (TECH 14) keeps increasing. My questions are:
1. Nylund et al. (2007) reported BIC, LMR & BLRT as performing equally well (all identifying the correct model >94% time) in LPAs. Given my discrepant results across the indices, what is the best practice approach?
2. At k=8, the best LL was only replicated 3 times with STARTS = 1600 100. Is this a sign of poor model fit (and therefore a good place to stop increasing k)?
3. I followed the steps in MPlus Web Notes #14 in requesting TECH11 & TECH14. When I freed variances across classes and requested output=TECH14, with the addition of "LRTBOOTSTRAP=120", the best LL was not replicated in 88/120 draws -- how many times does the best LL need to be replicated for the results to be trustworthy?
Once TECH11 or TECH13 obtain a p-value greater than .05, you should not look further. It sounds like you are trying to extract too many classes. You should use theory to help you decide on the number of classes. The loglikelihood should be replicated several times when you are extracting more than two classes.
Lewina Lee posted on Tuesday, September 18, 2012 - 1:52 pm
Thanks, Linda. I assume you meant TECH14 (not TECH13)? Please correct me if I'm wrong.
In general, when running TECH14, at what threshold (% or # of bootstrap draws?) does M+ output a warning about the best LL not being replicated over X draws? Does the best LL need to be replicated across all bootstrap draws for the resulting p-value to be trustworthy?
The warning comes out if more than 50% of the draws have the problem. The best LL should be replicated for a trustworthy p-value. If you need to, you can use the LRTBOOTSTRAP command to increase the number of random starts used in tech14.
Lewina Lee posted on Saturday, September 22, 2012 - 4:53 pm
Thank you for the information, Tihomir!
I have one more question about the TECH14 output -- what does "Successful Bootstrap Draws" mean?
These are in general the number of bootstrap draws used to compute the P-value and that number is not a constant. If the p-value is near 5% more draws are used. In case some convergence problems occur for some bootstrap draws, you will also get a warning about that. If you don't get any convergence warnings then Successful Bootstrap Draws are all the Bootstrap Draws.
Lewina Lee posted on Tuesday, September 25, 2012 - 1:15 pm