In Krueger et al (J. of Abnormal Psych, v. 111, p. 415), it is written "BIC provides a quantitative index of the extent to which each model maximizes correspondence between the observed and model predicted variances and covariances while minimizing the number of parameters. Better fitting models have more negative values, and the difference in BIC values relates to the posterior odds—the odds ratio formed by taking the probability that the second model is correct, given the data, over the probability that the first model is correct given the data. When comparing models, a difference in BIC of 10 corresponds to the odds being 150:1 that the model with the more negative value is the better fitting model and is considered “very strong” evidence in favor of the model with the more negative BIC value (Raftery, 1995)."
check this paper for some information there please? Spiegelhalter DJ, Best NG, Carlin BP and Van der Linde A, "Bayesian Measures of Model Complexity and Fit (with Discussion)", Journal of the Royal Statistical Society, Series B, 2002 64(4):583-616.
This discussion, combined with another one ("selecting the number of classes") rise a question. In the other discussion, B. Muthén showed that Nagin-BIC = -2 * Mplus-BIC.
In the 1999 paper, Nagin showed that to obtain the bayes factor approximation in comparing two models, one should use: e(difference between the two BICs) and then use the Table 2 Dr B. Muthén refered to up here (Jeffreys scale of evidence as reported by Wasserman). As an approximation, this means that BIC differences of 2.3026 or higher indicate strong evidence in favour of the model with the highest BIC (with Nagin BIC, which is negative) since ln(10) = 2.3026.
Beeing no mathematician, my question will appear naive. Given the difference between Nagin-BIC and Mplus-BIC, should we still use the formula proposed in Nagin but first divide Mplus-BIC by 2 before using the formula (or simply dividing the differences between the 2 BIC by 2) ? Or am I missing something obvious ? If that is the case, a difference of 4.6052 or higher indicate strong evidence in favour of the model with the lowest BIC for Mplus.
Yes, he seems to use something else. I checked his work and I just dont manage to get the relationship between his BIC and Mplus BIC to execute the conversion... My guess would be to multiply BIC(Mplus) by 4 since: BIC(Mplus) of 2.5 = BIC(Nagin) of 5 and a BIC(Nagin) of 5 = (according to Nagin formula) e(5) = 150 Bayes factor (which Raftery equals to a BIC difference of 10 points).
But this assumes that Nagin formula is right and my deduction is unrelated to the formulas reported in Raftery.
I will send you the paper just in case you manage to easily get the conversion (the relevant part is pp.130-135 but especially equations 21 and 23). I believe it may be useful to Mplus users other than me ?
For those interested. See my other posting (friday 25th) under "Selecting the number of classes".
If Mplus BIC = -2 times Nagin BIC rather than the reverse, it means that Raftery table can be directly applied to Mplus BICs.
Everybody but Nagin, including Raftery, appear to be using Schwarz BIC (which is also used in Mplus).
According to Raftery, a BIC of 10 = a Bayes factor of 150. According to Nagin, a BIC of 5 = a Bayes factor of 150 (e of 5 = 150).
This correspondance between a BIC of -2 logL + r log n and Raftery tables are also supported in McLachlan & Peel (2000) book on pages 209-211.
Jon Elhai posted on Monday, April 28, 2008 - 6:04 pm
To Alexandre Morin: This clarification on BIC is helpful. Your post the other day got me thinking about this; I'm sure many of us on the listserv feel less confused now.
Rob Dvorak posted on Saturday, July 03, 2010 - 12:54 am
Just to clarify (since it's been two years since anyone posted here). Mplus uses Schwartz BIC (the same as Raftery), meaning that the Tables from Raftery can be directly applied to Mplus BIC differences... correct? Therefore, M1 with a BIC of 130 would be rejected for M2 with a BIC of 120 (i.e., a Bayes factor of ~150:1), allowing for a very strong posterior probablity that M2 is the preferred model.
Here are some BIC citations of interest from Bengt:
Wasserman (2000) in J of Math Psych gives a formula (27) which implies that a BIC-related difference between two models is logBij where B is the Bayes factor for choosing between model i and j. Wasserman's (27) says that logBij is approximately what Mplus calls minus 1/2 BIC. This means that 2log Bij is in the Mplus BIC scale apart from the ignorable sign difference.
Kass and Raftery (1995) in J of the Am Stat Assoc gives rules of evidence on page 777 for 2log_e Bij which say that >10 is very strong evidence in favor of the model with largest value.
So, to conclude, this says that an Mplus BIC difference > 10 is strong evidence against the model with the highest Mplus BIC value (I hope I got that right).
Raftery has a Soc Meth chapter:
Raftery, A. E. (1995). Bayesian Model Selection in Social Research. Sociological Methodology, 25, 111-163.