Eric Knowles posted on Wednesday, February 19, 2020 - 4:10 pm
I'm trying to replicate an LPA analysis in Mplus that I previously conducted in STATA, and I've encountered some puzzling inconsistencies.
I have a data set with 6 latent class indicators (4 binary, 2 continuous). The binary indicators are coded 0, 1 and the continuous indicators are rescaled to range from 0 to 1. In both packages, my models assume uncorrelated indicators within class and equal variances across classes. I'm using ML estimation. All models run normally in both programs.
For 2-class solutions, Mplus and STATA produce identical results in terms of posterior class probabilities and frequencies. However, any time I specify 3 or more classes, the results diverge in both these respects. E.g., I've observed that Mplus produces many more posterior probabilities close to 0 and 1, whereas STATA finds many more between 0 and 1. I've tried many different 2- and 3-class models (e.g., only using the binary indicators or only using the continuous indicators), and I always find that Mplus and STATA agree exactly for 2 classes and disagree substantially for more classes.
I'd like to see at least similar class probabilities and frequencies regardless of the program I'm using, as this would give me more confidence in my models.
Any advice is appreciated, and I'm happy to provide and additional information if it would help.
Eric Knowles posted on Wednesday, February 19, 2020 - 7:28 pm
Addendum to my previous post ... I was finally able to reproduce my STATA results in Mplus by doing the following:
1. Using as starting values in Mplus the means (for continuous indicators) and thresholds (for binary indicators) from the STATA results.
2. Specifying STARTS = 0 in Mplus.
I'm not sure what this reflects about the difference in how the two programs are operating. But I'm wondering if STATA (at least by default) picks a single set of starting values and then proceeds with estimation, whereas Mplus tries many perturbations around the specified starting values. (Consistent with this, I noticed before that the exact log-likelihood in my STATA results was listed in the Mplus output, but only as a "worse" random start in the best-to-worst ranking.)
I'm wondering whether this implies that I should trust the Mplus results more than the STATA results, since Mplus seems to have found a better solution (i.e., when I'm not forcing it to reproduce the STATA results).
You should focus on the log likelihood. So first check if the two programs give the same loglikelihood (and same number of parameters of course) and if not, see which program gives the best logL. If it is not Mplus, then that means that you have to increase the number of Starts so that you replicate the best logL many times. Only then are you ready to compare estimates.
Eric Knowles posted on Saturday, February 22, 2020 - 11:49 am
Thank you, Dr Muthen.
I verified that the Mplus and STATA models have the same degrees of freedom, and Mplus gives the better log-likelihood. So I'll go with those results.