Comparing DICs
Message/Author
 Thomas Rodebaugh posted on Wednesday, January 23, 2019 - 2:30 pm
Thanks in advance for any pointers. We have some situations in N=1 models in which we would like to test whether a single factor or two factors makes more sense. We were expecting to be able to use DIC as evidence in this determination. However, we are finding that DIC prefers a two-factor model even when the correlation between those two factors is very high (above .9, including .99). We are starting to wonder whether there's something wrong with our assumption that DIC would be informative here. (Plenty of detail available if any of it would help; we did our best to use procedures outlined in powerpoint slides for presentations on DSEM.)
 Tihomir Asparouhov posted on Thursday, January 24, 2019 - 10:19 am
DIC should work. I am attaching a little simulation which can illustrate how it works. If this doesn't help you can send your example to support@statmodel.com

----------------
MONTECARLO: NAMES ARE y1-y6;
NOBS = 500;
NREP = 10;
ANALYSIS: estimator=bayes;
MODEL MONTECARLO:
ETA1 BY Y1-y6@1 (&1);
ETA1*1;
ETA1 on ETA1&1*0.4; y1-y6*1;
MODEL:
ETA1 BY Y1@1 Y2-Y6*1 (&1);
ETA1*1;
ETA1 on ETA1&1*0.4;

----------------
MONTECARLO: NAMES ARE y1-y6;
NOBS = 500;
NREP = 10;
ANALYSIS: estimator=bayes;
MODEL MONTECARLO:
ETA1 BY Y1-y6@1 (&1);
ETA1*1;
ETA1 on ETA1&1*0.4; y1-y6*1;
MODEL:
ETA1 BY Y1@1 Y2-Y3*1 (&1);
ETA1*1;
ETA1 on ETA1&1*0.4;
ETA2 BY Y4@1 Y5-Y6*1 (&1);
ETA2*1;
ETA2 on ETA2&1*0.4;
-----------------------

If the hypothesis testing can be summarized with testing the factor correlation <1, I would recommend the simpler Z-test for the parameter as if it is in an ML estimation.
 Thomas Rodebaugh posted on Thursday, January 24, 2019 - 12:10 pm
Thanks for the suggestions, Tihomir. Does your answer change at all given that we are talking about N=1 models here?

Notably, we also expected DIC to tell us whether there should be one factor or two, yet we keep finding that DIC suggests two factors, even when the correlation between the two factors is very high. That might suggest we are doing something wrong, but I also wondered if anyone has checked if DIC behaves as expected in the N = 1 case on this front.
 Tihomir Asparouhov posted on Thursday, January 24, 2019 - 12:55 pm
Correlation of 0.99 is not enough to replace two factors with one - the autocorrelation of the two factors must be the same as well.
 Thomas Rodebaugh posted on Monday, January 28, 2019 - 11:56 am
Thanks, Tihomir, we were thinking in terms of standard SEM/CFA and hadn't considered the autocorrelations. It looks like that may be what's at issue.

A related question from the same analyses. Some of our models, particularly some "dumb" baseline models that don't look like they should be very good, are returning negative pDs (often with smaller DICs than the "good" models!). We have found some indication in the literature that some are inclined to interpret negative pDs as suggesting a problem with the model. We are inclined to agree both because the idea of negative parameters is silly and because we tend to get this for models that we consider on the silly side. I'd appreciate any guidance on this issue (including any pointers for good discussions or simulations on this point).

Thanks!
 Tihomir Asparouhov posted on Monday, January 28, 2019 - 1:38 pm
My experience is that negative pDs go away with many iterations, i.e., they mostly indicate inadequate iterations (which may indirectly be a sign of poor model or model estimation, for example, variance fixed to 0 or poorly identified model). You can try fbiter=50000; as a first step. If the problem persists send it to support@statmodel.com
 Thomas Rodebaugh posted on Tuesday, January 29, 2019 - 7:42 am
Thanks--the negative pD in at least one case persists beyond fbiter = 1000000 with a thin of 100, so apparently if the model gets silly enough even lots of iterations won't make it go away! I'll do some work with that model and will send it on if problems persist. Thanks!
 Mary M Mitchell posted on Thursday, May 23, 2019 - 8:25 am
Dear Drs. Muthen,

I am running a 2 level DSEM model and am looking at the DIC to determine which model has the best fit. I tried linear, quadratic, cubic, and quartic models and the DIC kept decreasing. However, the beta coefficients for the quadratic model were not significant. Does this mean that I need to stop at the linear model?

Thanks,

Mary Mitchell
 Tihomir Asparouhov posted on Thursday, May 23, 2019 - 10:02 am
I would recommend several additional steps before you settle this.

1. Run the a two-level analysis with the trend and see if this makes it easier to answer the above question

2. Run the RDSEM model instead of the DSEM model since that disentangles the trend from the dynamics. You might find section 14 useful
The RDSEM model would also be comparable to the two-level model.

3. Remove non-significant dynamic paths as some of these could compromise the power to detect significance in the trend.

4. If these steps do not help you should consider these two issues:

a. polynomial trends may in fact be useful in accommodating non-polynomial trends well and while individual coefficients in the polynomial trend would lack significance the overall DIC criterion may indeed be making a valid point. Here I would recommend looking at the Mplus time series plots to see if you can visually justify non-linear trend.

b. Significance of individual coefficients is generally more reliable in DSEM than DIC comparison. This is particularly the case when you have a large number of pD due to missing data for example. The DIC would generally be difficult to estimate well and will have some variability. You can study the variability of DIC by changing the random seed of the MCMC using the bseed option. If the DIC difference between two models is so large that it overcomes that DIC variability then the lower DIC model should be preferred. If the DIC differences are small compared to the variability then you should ignore the DIC.
 Thomas Rodebaugh posted on Tuesday, March 03, 2020 - 10:18 am
We are running an ML-DSEM model in which we are not sure whether to add effects or not (Lag 2 and Lag 3 paths for one of the variables). Analysis using another method suggests the effects should be added, and they have significant paths when added in ML-DSEM. However, even running 70k iterations with thin=100 (which takes ~6 days) will not settle the DIC when comparing two random seeds for the same model (e.g., 70690 versus 72346), and the DICs between the models overlap more than two random seeds differ in the same model. We have no a priori grounds to prefer one model versus the other, but the one with the additional effects seems to clarify some things. Is there anything in addition we should consider here before selecting one versus the other?
 Tihomir Asparouhov posted on Tuesday, March 03, 2020 - 4:39 pm
The simple way to test between the two models is using MODEL TEST, where you can test the hypothesis that both effects are zero. This is usually a very stable test. DIC is probably so unstable due to missing data, many random slopes or within level factors. It is curious though that you have that much instability in DIC and most likely there is something else also going on with the model. If you have many cluster specific random effects, check that their variances are not too small. If Var/SE(Var) < 3 we would consider such effects to be marginally significant and replacing such with non-random effects could improve the model / stability of the model.
 Thomas Rodebaugh posted on Tuesday, March 10, 2020 - 7:09 am
I can confirm that the model has many random slopes and with-level factors (not that much missing data). Setting the least variant random slopes as fixed improves things somewhat--the two random seeds are about 1000 apart instead of close to 2000 (and in 5 days instead of 7). Just to confirm, the output gives posterior SD for the variance, not SE (unless I'm missing something). Are you saying if Var/SD(Var) < 3 you would fix those effects? That's nearly all of the paths in this case. (I fixed < 2 paths as a first step.) Any clarification much appreciated.
 Tihomir Asparouhov posted on Wednesday, March 11, 2020 - 9:32 am
Note that 1000 might look large but it might not be. Consider this. DIC is proportional to the sample size N, i.e., 1000 difference may be third or forth significant digit if the sample size is large, i.e., even if you see that kind of difference, DIC can still be used for hypothesis testing of competing models, where the DIC differences are even larger in magnitude. As an example, you might want to run a second model where you pick a significant parameter and you fix it to zero. This will give you a sort of a baseline to see how DIC works in those circumstances (the circumstances are pD is very large - mostly because of the within level factor).

Bayes posterior SD is asymptotically equivalent to ML-SE and we end up using the two interchangeably.

We generally tend to recommend fixing random effects for which Var/SD(Var) < 3 (unless you have some predictors for the random effect, they can still be eliminated by introducing the corresponding interactions, but I don't know that it would actually hurt the model that much if you leave them random). The value 3 (rather than 1.96 or 2) is chosen because simulations tend to suggest that. If the value is between 2 and 3, we consider the effect to be only marginally significant at best and removing such random effects would not result in some gross model misspecification. You have one thing correct though here - as you run different models the significance will change. Removing some random effects will make the remaining random effects more significant - so things have to be done stepwise, but generally we tend to do things in the opposite direction. Start with the simplest models first (and trim insignificant effects gradually) rather than starting with the most complex models and trim insignificant effects.

I would also recommend spending time looking at the reason the model converges so slowly - again it could be due to model complexity where all SE are too large. It could be a particular parameter that is not well identified. This is not just about reducing the time you are waiting for the model, but it could actually end up being about the quality of the model. Perhaps some of the small variance random effects are used as predictors (you would want to stay away from that).

This was the long answer. The short answer is to use MODEL TEST. It is the simplest and most reliable.
 Thomas Rodebaugh posted on Wednesday, March 11, 2020 - 2:59 pm
Thanks for both the short and long answer--in particular the long answer is very helpful in terms of improving my understanding. I'll post a follow-up if I discover something that turns out to have been specifically responsible for the long running time (other than the fact that there's lots of data and a complex model, but not that many participants).
 Thomas Rodebaugh posted on Friday, March 13, 2020 - 7:21 am
In case it's helpful for others--two of our variables are "day in study" and "survey of day." Both, in raw form, have very different distributions than the other items. It seems that scaling these values so that their variances (and variances of associated slopes) are more in range of the others is helping the DIC settle down more quickly (now within 500 after 1/7th the number of iterations).

I'm still not used to an analysis that's sensitive to this kind of thing. . .
 Hadi posted on Saturday, June 13, 2020 - 11:16 pm
Dear Drs. Muthen,
can we use DIC to find the best time series model in DSEM?
I want to find that AR models or ARMA models fit better to my data in DSEM models.how can I understand it?
 Tihomir Asparouhov posted on Monday, June 15, 2020 - 9:56 am
Generally, it is most efficient to use credibility intervals to compare models. ARMA has just 1 more parameter than AR so just testing that parameter is preferred. Since that is a variance parameter though we use Z-score based test and a cutoff value of 3. For more complex comparisons you can use MODEL TEST.

You can take a look at
which has a lot of information on using DIC.

Also this section
ARMA(1,1) and the Measurement Error AR(1) Model