Mplus Discussion >> Differences in BIC values

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Differences in BIC values

Mplus Discussion > Categorical Data Modeling >

Message/Author

Elmar Schlueter posted on Wednesday, May 16, 2007 - 11:29 am

dear all,

is there any rule of thumb or literature concernign the question how 'strong'
difference in two BIC or AIC values need to be inorder to be considered substantial?

Thanks

Linda K. Muthen posted on Wednesday, May 16, 2007 - 1:24 pm

I have never heard of any.

Scott Weaver posted on Wednesday, May 16, 2007 - 2:49 pm

In Krueger et al (J. of Abnormal Psych, v. 111, p. 415), it is written
"BIC provides a quantitative index of the extent to which each model
maximizes correspondence between the observed and model predicted
variances and covariances while minimizing the number of parameters.
Better fitting models have more negative values, and the difference in BIC
values relates to the posterior odds—the odds ratio formed by taking the
probability that the second model is correct, given the data, over the
probability that the first model is correct given the data. When comparing
models, a difference in BIC of 10 corresponds to the odds being 150:1 that
the model with the more negative value is the better fitting model and is
considered “very strong” evidence in favor of the model with the more
negative BIC value (Raftery, 1995)."

Bengt O. Muthen posted on Wednesday, May 16, 2007 - 2:49 pm

The Nagin (1999) Psych Methods article has a table of Wasserman's "Bayes factor" values which give guidance related to BIC differences between models.

Boliang Guo posted on Thursday, May 17, 2007 - 2:35 am

check this paper for some information there please? Spiegelhalter DJ, Best NG, Carlin BP and Van der Linde A, "Bayesian Measures of Model Complexity and Fit (with Discussion)", Journal of the Royal Statistical Society, Series B, 2002 64(4):583-616.

Alexandre Morin posted on Friday, April 18, 2008 - 12:49 pm

Greetings,

This discussion, combined with another one ("selecting the number of classes")
rise a question. In the other discussion, B. Muthén showed that Nagin-BIC = -2 *
Mplus-BIC.

In the 1999 paper, Nagin showed that to obtain the bayes factor approximation in
comparing two models, one should use:
e(difference between the two BICs) and then use the Table 2 Dr B. Muthén
refered to up here (Jeffreys scale of evidence as reported by Wasserman).
As an approximation, this means that BIC differences of 2.3026 or higher indicate
strong evidence in favour of the model with the highest BIC (with Nagin BIC,
which is negative) since ln(10) = 2.3026.

Beeing no mathematician, my question will appear naive. Given the difference
between Nagin-BIC and Mplus-BIC, should we still use the formula proposed in Nagin
but first divide Mplus-BIC by 2 before using the formula (or simply dividing the differences between the 2 BIC by 2) ? Or am I missing something obvious ?
If that is the case, a difference of 4.6052 or higher indicate strong evidence in favour of the model with the lowest BIC for Mplus.

Linda K. Muthen posted on Friday, April 18, 2008 - 6:59 pm

I think you would divide 2.3026 by 2. The Nagin BIC is -2 times the Mplus BIC.

Alexandre Morin posted on Saturday, April 19, 2008 - 6:20 am

Thank you Linda,
I hate it when I miss something like that...

Alexandre Morin posted on Monday, April 21, 2008 - 11:40 am

Greetings,

Having recovered from my miscalculation, I still have a follow up.

If I go back up to Krueger et al citation. This paper states that a BIC difference of 10 provides a Bayes factor of 150 (odds 150:1). This is also what Raftery (1995) reports.

According to Nagin method of e(bic difference), a BIC of 5 provides the 150:1 ratio. This would be equivalent to a Mplus BIC difference of 2.5.

Then, either Nagin method does not work, or Raftery uses yet another BIC ? Anybody can help with this one ?

Linda K. Muthen posted on Monday, April 21, 2008 - 4:52 pm

It may be that Raftery uses another BIC. You should check his work.

Alexandre Morin posted on Tuesday, April 22, 2008 - 8:09 am

Yes, he seems to use something else. I checked his work and I just dont manage to get the relationship between his BIC and Mplus BIC to execute the conversion... My guess would be to multiply BIC(Mplus) by 4 since: BIC(Mplus) of 2.5 = BIC(Nagin) of 5
and a BIC(Nagin) of 5 = (according to Nagin formula) e(5) = 150 Bayes factor
(which Raftery equals to a BIC difference of 10 points).

But this assumes that Nagin formula is right and my deduction is unrelated to the formulas reported in Raftery.

I will send you the paper just in case you manage to easily get the conversion (the relevant part is pp.130-135 but especially equations 21 and 23). I believe it may be useful to Mplus users other than me ?

Alexandre Morin posted on Monday, April 28, 2008 - 8:38 am

Got it!

For those interested. See my other posting (friday 25th) under "Selecting the number of classes".

If Mplus BIC = -2 times Nagin BIC rather than the reverse, it means that Raftery table can be directly applied to Mplus BICs.

Everybody but Nagin, including Raftery, appear to be using Schwarz BIC (which is also used in Mplus).

According to Raftery, a BIC of 10 = a Bayes factor of 150. According to Nagin, a BIC of 5 = a Bayes factor of 150 (e of 5 = 150).

This correspondance between a BIC of -2 logL + r log n and Raftery tables are also supported in McLachlan & Peel (2000) book on pages 209-211.

Jon Elhai posted on Monday, April 28, 2008 - 12:04 pm

To Alexandre Morin:
This clarification on BIC is helpful. Your post the other day got me thinking about this; I'm sure many of us on the listserv feel less confused now.

Rob Dvorak posted on Friday, July 02, 2010 - 6:54 pm

Just to clarify (since it's been two years since anyone posted here). Mplus uses Schwartz BIC (the same as Raftery), meaning that the Tables from Raftery can be directly applied to Mplus BIC differences... correct? Therefore, M1 with a BIC of 130 would be rejected for M2 with a BIC of 120 (i.e., a Bayes factor of ~150:1), allowing for a very strong posterior probablity that M2 is the preferred model.

Linda K. Muthen posted on Saturday, July 03, 2010 - 9:45 am

I think this is correct. Can you give the Raftery reference?

Rob Dvorak posted on Saturday, July 03, 2010 - 1:39 pm

Hi Linda,

Here's the Raftery reference.

Raftery, A. E. (1995). Bayesian Model Selection in Social Research. Sociological Methodology, 25, 111-163.

Alex Walker posted on Sunday, December 19, 2010 - 4:02 pm

Are there specific guidelines with respect to the significance of BIC differences, e.g, in LR, is a difference of 4.4 between two models significant? (M1 BIC=71.01, MC BIC=66.61)

thanks
Alex

Linda K. Muthen posted on Monday, December 20, 2010 - 10:16 am

Here are some BIC citations of interest from Bengt:

Wasserman (2000) in J of Math Psych gives a formula (27) which implies
that a BIC-related difference between two models is logBij where B is
the Bayes factor for choosing between model i and j. Wasserman's (27)
says that logBij is approximately what Mplus calls minus 1/2 BIC. This
means that 2log Bij is in the Mplus BIC scale apart from the ignorable
sign difference.

Kass and Raftery (1995) in J of the Am Stat Assoc gives rules of
evidence on page 777 for 2log_e Bij which say that >10 is very strong
evidence in favor of the model with largest value.

So, to conclude, this says that an Mplus BIC difference > 10 is strong
evidence against the model with the highest Mplus BIC value (I hope I
got that right).

Raftery has a Soc Meth chapter:

Raftery, A. E. (1995). Bayesian Model Selection in Social Research.
Sociological Methodology, 25, 111-163.

that talks about Bij from a SEM perspective.

There's also a good discussion about this here:

http://www.statmodel.com/discussion/messages/23/2232.html?1209409498

Doug posted on Wednesday, October 05, 2011 - 1:13 pm

This sounds interesting. How do you output BIC and AIC values in Mplus? Can they be used with the WLSMV estimator?

Linda K. Muthen posted on Wednesday, October 05, 2011 - 1:58 pm

BIC and AIC are not available for weighted least squares estimation. They are available for maximum likelihood estimation. They are printed automatically when available.