I am validating a previously tested structural equation model on a new dataset. The outcome variable is binary (smoking or abstinent). I would like to determine if alternate theory based models are better than the previously developed and tested model. Models are not nested, but do include the same variables (the configuration of the models differ). I ran the initial model using WLSMV. However, I am now using ML to estimate the model so that I can obtain the BICs for each model. I have not been able to find the appropriate procedure for determining whether one BIC is better than another.
Do I merely subtract one Mplus BIC from another? If the difference is greater than 10 is this strong evidence that the model with the lower BIC is superior?
Can you send me some references for using BIC to determine superiority of non-nested SEM models?
Wasserman (2000) in J of Math Psych gives a formula (27) which implies that a BIC-related difference between two models is logBij where B is the Bayes factor for choosing between model i and j. Wasserman's (27) says that logBij is approximately what Mplus calls minus 1/2 BIC. This means that 2log Bij is in the Mplus BIC scale apart from the ignorable sign difference.
Kass and Raftery (1995) in J of the Am Stat Assoc gives rules of evidence on page 777 for 2log_e Bij which say that >10 is very strong evidence in favor of the model with largest value.
So, to conclude, this says that an Mplus BIC difference > 10 is strong evidence against the model with the highest Mplus BIC value (I hope I got that right).
Raftery has a Soc Meth chapter from around 1995 (?) that talks about Bij from a SEM perspective
Rob Dvorak posted on Wednesday, July 14, 2010 - 6:45 pm
Here's the Raftery cite:
Raftery, A. E. (1995). Bayesian Model Selection in Social Research. Sociological Methodology, 25, 111-163.
Yes, that's how I understand it - well, actually, Kass & Raftery (1995) in JASA use the term "Very Strong" for an Mplus BIC diff > 10. They view 6-10 as "Strong" evidence. They say that
"From our own experiences, these categories seem to furnish appropriate guidelines."
ri ri posted on Saturday, September 13, 2014 - 2:21 pm
I also Need to compare two non-nested models with categorical outcomes. The difference of BIC is 6. Can I interprete it as a strong evidence that the model with lower BIC is a better model? Are These two models different?
I checked the Kass & Raftery's paper, they mentioned 2-6 as positive. What is the lowest cutoff of rejecting H0?
ri ri posted on Sunday, September 14, 2014 - 12:08 am
Thank you Linda. I checked that one. At the mean time I was aware of the post by Bengt above. He said 6-10 means strong evidence, >10 is very strong. So one can interpret a difference of BIC beyond 6 as strong evidence to reject the H0?
That's what that article says. I personally would want a much larger difference. And I wouldn't characterize it as "rejecting H0" - instead you are getting support that one model is better than another.
deana desa posted on Thursday, June 18, 2015 - 8:15 am
May I know how/where to find BIC and DIC values from a BSEM analysis or how can I calculate it from Mplus BSEM output?
If they don't show up they are not yet implemented for the case you consider. They can't be computed simply from the output.
JOEL WONG posted on Saturday, October 24, 2015 - 7:31 pm
In Mplus, can a comparison of BIC values be made between two non-nested models which have different variables?
In model 1, I have a one-item (single indicator) outcome. In model 2, this outcome is replaced by a latent variable with 3 indicators (of which one of the indicators is the one-item outcome used in Model 1). The path coefficients were about the same in both models.
Model 2 has a much larger BIC value - can I legitimately claim that model 1 is preferred and that there is no methodological advantage in modeling the outcome as a latent variable in Model 2?
Given that you need to use the scaling correction factor for LR chi-square tests when using MLR, do any adjustments need to be made to the AIC/BIC values when comparing across models? Or is it totally safe to compare AIC/BIC values across models, including models where both used MLR and models where one used MLR and another did not? Thanks!
AIC and BIC are not adjusted, i.e., they are the same as the ML values and can be compared regardless of the estimator the usual way. The LRT correction is a correction for the p-value. BIC comparison doesn't really provide significance levels. It evaluates the model + model point estimates fit to the data.