I am carrying out a LCA with 6 categorical variables (n ~ 1,300). I am using BIC, entropy, LMR(tech 11), and BLRT(tech14). I've read the Nylund et al. (2007) simulation paper and reviewed tech11 and tech14 sections in the User's manual.
As others have noted in various Mplus discussion threads, the p-values for the LMR and BLRT diverge dramatically... 3 class - LMR .000, BLRT .000 4 class - LMR .8835, BLRT .000 5 class - LMR 1.00, BLRT .000 ... 7 class - LMR 1.00, BLRT .000
Also the LL's for the k and k-1 class models are replicated in the BLRT's. For all baseline models the lowest LL's were replicated 5+ times. By manipulating starting values the largest class is identified last in the BLRT and LMR tests. Also, for BLRT test bootstrap draws were set at 100 and LRT starts was set at 0 0 500 75 so that the best LL was replicated in each of the 100 bootstrap draws. So, these tests have been carried out carefully and follow guidelines as set out in the user's manual.
What might be accounting for the diverging p-values for the LMR and BLRT tests? Do you have suggestions for moving forward?
I assume that the 2-class model has 0 p values for both as well.
BLRT and LMR agree well in the Nylund et al simulations, but we have now seen many real-data examples where they don't and where BLRT seems to often reject even for a high number of classes. More research into this phenomenon is needed.
In these instances of disagreement between BLRT and LMR I would rely more on BIC which also performed well in the Nylund et al study unless sample size is small. I would also rely on interpretability. I usually work with BIC as a first step and then I may use BLRT and LMR to help choose between the key number of classes. I wonder if BIC shows a distinct minimum in your case. If not, this could be an indication that the model type is not suitable for the data - for example a factor model or a factor mixture model may be more suitable than LCA.