I am running a Monte Carlo simulation testing the impact of ignoring nesting, and I got a strange finding when I created data using the following code (as a kind of control condition):
MONTECARLO: NAMES ARE Y X; NOBSERVATIONS = 200; NREPS = 1000; SEED = 58459;
NCSIZES = 1;
CSIZES = 5 [10(4)];
Analysis: Type = threelevel;
MODEL POPULATION: %WITHIN% Y on X*.30; X*1; Y*.91;
%BETWEEN LEVEL2% X*1; Y*1;
%BETWEEN LEVEL3% X*0; Y*0;
Model: %WITHIN% Y on X*.30; X*1; Y*.91;
%BETWEEN LEVEL2% X*1; Y*1;
%BETWEEN LEVEL3% X*0; Y*0;
The results show fairly substantial differences in standard error bias depending on whether I set CSIZES = 5 [10(4)] or CSIZES = 10 [5(4)], despite the fact that Level-3 variances are zero. Is there something I am missing, I thought this shouldn't be possible?
Such results are as expected. You have zero variance at the third level and that is the source of the trouble/bias in SE. Zero variances cause really slow mixing (highly correlated components in the MCMC estimation) and to get unbiased SE you would have to run extremely long MCMC sequences.
If you have zero variance at the third level you should run two-level models. If you need to run three level models nevertheless you should use something like that %BETWEEN LEVEL3% X*0.01; Y*0.01; to allow the MCMC to move (for both generation and estimation). This would be essentially equivalent to the model you want but will numerically give you the room to estimate this with Bayes.
Thank you for your response! It is not the MCMC estimator, it is MLR (although I must have deleted the estimator to reduce space), though in the context of a Monte Carlo simulation study. What is most puzzling to me is the difference in standard error bias when shifting the number of clusters while holding the total N constant and the variance for Level-3 is zero. As an example, with 5 Level-3 units and N=100, I get an Monte Carlo SD = 0.0532, an average SE of 0.0453, thus a downward bias of (0.0453-0.0532)/0.0532 = -14.85%. Changing to 20 Level-3 units (still total N=100), I get an SD of 0.0521, average SE =0.0526, thus only 0.96% bias.
When the variance for Level-3 is zero, there is no nesting and thus the number of Level-3 clusters should have no impact, I thought?
When you have variance fixed at zero the likelihood has a division by zero - not something you want. It is particularly bad when the variance is zero and the mean is not since that mean must be estimated from the tiny shift the random effect has. ML/MLR has similar problems when you have zero variance but typically they will manifest differently (e.g. slow or incomplete convergence, saddle points - check tech9 for potential errors, switching between EMA(accelerated EM), to slow EM). Ultimately we don't recommend it. You have three option: 1. use two-level 2. use the BETWEEN option in the variable command to specify the variable as being present on the first two levels only and remove the zero variance random effect 3. use type=complex twolevel where you can use two levels of clustering but only the first two are modeled.
Of course now that you brought up the actual bias information I can tell you that none of what I said this far is relevant. The following is the actual reason. The standard errors are asymptotic, i.e., they will be unbiased only for large sample sizes, it is normal to have bias of 14% with a sample size of 100. Furthermore, the SE are asymptotic in terms of the number of independent units, which are the level 3 clusters. To get unbiased SE you must have a large number of level 3 units. If the sample size alone is large but the number of level 3 units is small you still can't count on unbiased SE. In your example settings when you switch to 10 level 3 units (more level 3 units) you are getting smaller bias because you are getting closer to the asymptotics than when you have 5 units.
My personal experience is that 14% bias in the standard error is completely ignorable. It is too small to have a real impact. You are generally talking about small number 0/0 - these percentages are not reliable (in the sense that if you change the rest of the parameters in the model the percentage will change) - in absolute distance this is completely ignorable (absolute bias, rather than relative bias). Furthermore this will have a very minor impact on coverage - dropping from 95% to 91%. If you have 50% bias you get down to 77% coverage which is of course problematic. These numbers are obtained from the standard normal distribution function.
Thank you, I get it! I'd like to ask another question: I generated 2-level data according to the following model:
MODEL POPULATION: %WITHIN% Y on X*.30; X*1; Y*.91;
%BETWEEN% X*1; Y*1;
I then estimated either using the exact same model, or using group mean centering for X on saved samples from the above model. Results were very similar. However, I tested four Level-2 sample sizes, N = 50, 100, 200 and 400 (all with 4 repeated measures on the within level, thus total N = 200, 400, 800 and 1600), and results show almost linearly increasing standard error bias with larger samples. The increase is there even with absolute bias, not just relative bias. With N = 50, bias was 1.74%, while with N = 400 bias was 9.6%. I tried to recreate this same analysis in R but results did not show this effect. I have tried to see if I have made some error but I can't find any?
Mplus does a latent variable decomposition of X here (see UG ex 9.1). Is your aim to compare this to the approach of using observed group-mean centered X on within and observed group-mean X on between?
Actually not, I just included this information to show that I got the same strange result regardless of method. My question is how standard error bias for the within effect of X on Y can increase with larger sample size? This seems counterintuitive to me, and also at odds with my results when trying to replicate this in R.
Maybe run more replications. The results I posted are the ones that come out from Mplus default settings. You must have some other settings. If you want the precision of third or fourth decimal point you might have to run many many more replications. You can send your analysis to email@example.com if you can't figure it out.
Thanks, yes the problem seems to disappear with more replications. I also re-created your results when I deleted the seed statement that I had used. Do you know if there are recommendations in the literature for how many replications are needed when running Monte Carlo simulations for multilevel models?
I think there are. Some people recommend 500 but I don't recall the reference for that. I gave you my advice already but I will summarize it again:
- I use 100 replications and sometimes even less. I use more than 100 in less than 1% of the time and only if I am uncertain about reading the results correctly. It is honestly about training your eyes to read the results correctly. Very very rarely will 1000 replications contribute anything more than 100 replications
- It is a terrible idea and a waste of time to look at relative bias for quantities near zero or quantities that converge to zero, such as the standard errors or standard deviation (when sample size increases). At least use absolute bias, but even better ... when we want to make an inference on the standard errors we first look at the bias of the point estimates and the coverage. If systematic bias exist of some kind it always shows up in the coverage and coverage is much easier to read since it must be near the nominal level of 95%.
Thanks! The reason I persist with standard error bias is that in our simulations so far we see a strong correlation between standard error bias and inflated alpha level, as estimated by % significant coefficients when we set the population coefficient (Y on X) to zero. With -15% standard error bias, we see an alpha level of around 15%. An alpha level three times higher than it should be doesn't look ignorable to me?
You recommend looking at the point estimate and coverage first, but isn't it possible that coefficients are relatively unbiased, on average, while standard errors are too small? And if so, that the risk for spurious findings increases?
You say that you see alpha level of 15%. If I am understanding you correctly you would be seeing in the Mplus output coverage of 85% which would be of interest and would not be quite ignorable. First this isn't what my computation shows: 1-2*(1-Phi(0.85*1.96))=.90 So I would say that the expect coverage is 90% when the standard errors are underestimated by 15% and 90% coverage difference from 95% coverage is not as important. To actually be able to reliably pinpoint that kind of difference you would have to run many replications. Also keep this in mind. For smaller sample sizes we don't expect full compliance with asymptotic results, so 90% isn't news. For larger samples we always expect that 90% coverage will slowly converge toward 95% coverage as we increase the sample size.
Ultimately you have to observe the theory. Since you are using ML theory grantees correct coverage asymptotically. So if you claim that there is SE bias there are four possible sources: the number of replications was not enough, the sample size is not large enough (you can increase the number of clusters to 1000 - that is usually enough for most asymptotic studies), there is some modeling difference between the estimated and generating model (which I don't think there is in the simulations you showed so far), there is a bug in the code (which I don't think there is since this estimation has been in use for nearly 20 years and the last bug found there was probably more than 10 years ago and I am sure was not for a simple model as the regression model). So whatever claims you make - you have to make sure it doesn't contradict directly theoretical results.
Yes, bias and coverage are generally enough. If the point estimates are unbiased but the standard errors are too small you will be able to see this in the coverage - the coverage will be too small. Unless that coverage is below 90% however I would be skeptical as it is mostly likely due to not enough replications being done or not enough sample size.
Covarage was around 85% for the models with -15% bias. However, the confusion is probably because the model that showed -15% bias was not the same as the one I initially showed in this thread. I had simplified the model because I thought I saw a pattern that did not conform to what I thought I knew about these models.
The model that showed -15% bias was a three-level model with only 5 clusters on the highest level. The research question that I am interested in is the impact of ignoring third-level clustering on within-level regression coefficients, compared to trying to estimate three-level models. Our results show no problems with two-level models (using group-mean centering of X) regardless of how much nesting there is, but problems with three-level models when the number of clusters at the highest level are very small. This is probably no news to method experts like yourself, but in the field of psychotherapy research where I work the standard is to more or less always use three-level models as a "better safe than sorry" strategy. Our aim is to challenge this strategy.
This is what is known and theoretically backed: if you ignore the third level, the two-level model will underestimate the SE (as it assumes that all two-level units are independent). That is both for observed centering and latent centering. That bias will disappear if the variances on the third level are small and it will be biggest when the third level variance are the biggest.
If the number of third level units is small we tend to recommend type=twolevel complex; where the third level clustering is used only to correct the bias in the standard errors.
What you write is true for the impact of ignoring Level-3 on Level-2 estimates, right? What we're interested in is the possible impact of ignoring Level-3 on Level-1 estimates. Assuming that X is centered on Level-1, I believe that a 2-level random intercept model should yield proper estimates even if Level-3 nesting is ignored? That is what our preliminary findings seem to support.
I think that also the level 1 estimates can be affected - after all - the 3 level model is a two-level model where the level 1 can be written in a wide format. In some special cases it might happen that the SE is not affected, or for some particular parameters the SE is not affected. At least the SE of the intercept parameters should be affected. If the regression parameter varies across level 3 units (but that variation is ignored) I would expect that parameter SE to also be affected. However, I wouldn't be surprised if in your simulations the SE appear to be unaffected. Simulation studies are somewhat very pure - for example the distribution of the covariates is identical across level 3 units. If you do a different simulation study where the covariates distribution is level 3 units specific then you might find different results.
Thanks, yes we've included a condition with regression slopes varying on Level-3, and it seems that ignoring this slope variance affects standard errors for the fixed Level-1 effect of Y on X - although with the sample sizes and variances in slopes that we've used the two-level model results still look better than the three-level ones.
Could you please explain what you meant by your last statement, about "unit-specific covariate distribution for Level-3"? Is this possible to test using Mplus?
It is not very surprising (but is important in the aspect of your study) that the three level results are worse than those for two-level. If I am not mistaken this is due to to the small number of level 3 units. I think similar findings were reported here http://www.statmodel.com/bmuthen/articles/Article_127.pdf in quite different context but similar flavor.
Since you are using centering the covariate means will be 0 on the within level. So you can't make the mean be cluster specific but you can make the variance be cluster specific along the lines of User's guide example 9.28. The random variance feature is available only with the Bayes estimation in Mplus and only for two-level models but potentially you can generate data like that to see what effect it might have on the regression effect SE.