WARNING: PROBLEMS OCCURRED IN SEVERAL ITERATIONS IN THE COMPUTATION OF THE STANDARDIZED ESTIMATES FOR SEVERAL CLUSTERS.
At what point should I view the standardized results as untrustworthy? I'm running a bivariate RDSEM with a linear time effect (to account for non-stationarity), and 95/95 clusters are listed as having draws removed.
Is there way to identify how many draws were removed? If only 1 out 5000 were removed, that seems like less of a concern than 4000.
Model code: VARIABLE: NAMES = ID TREAT TIME Y2 Y1; MISSING = ALL(-99); CLUSTER = ID; USEOBS = (TIME < 15); USEVAR = Y2 Y1 TIME2; LAGGED = Y2(1) Y1(1); TINTERVAL = TIME(1); WITHIN = TIME2;
DEFINE: TIME2 = TIME;
ANALYSIS: PROCESS = 8; TYPE = TWOLEVEL RANDOM; ESTIMATOR = BAYES; BITER = (5000); THIN = 10; CHAINS = 4;
We don't provide yet that kind of additional information but you could make some progress on the question I think. I would consider the standardized results untrustworthy if there is a large discrepancy between the observed and estimated within level variances, i.e., I would compare the variance given in output:residual(cluster); for the above model v.s. the corresponding quantities for a model like this
%within% v | y1; %between% y1;
those should be close to the observed values but if you have missing data all bets are off since a time series model may actually provide unbiased estimates for the within level variance while sample quantities (based on listwise deletion) could be biased.
The iterations that are deleted are those where the model becomes non-stationary, usually that happens a lot for very short time series, due to very large AR posterior distribution or for clusters where the AR coefficients are large.
It might also be useful to check with plots that Y1 and Y2 are not subject to non-linear trends of Time.
You won't be able to put priors on the random effects, just on thier means and variance, but that won't be enough. But pretty much, yes, clearly T=12 is the source of the issue. N is not very relevant.
I wouldn't worry about the message being printed at all. I would say that the standardized estimates that we print are the best you can get with that data. Typically not many iterations are thrown away. I have actually never seen a case even with more than 10%. If the process is non-stationary in 50% of the iterations a more sever problem would occur and the model wouldn't be converging.
You can also do observed standardization - standardize the variables in each cluster. You can do that with a couple of auxiliary runs and the cluster_mean option in the define command.