Carson posted on Wednesday, February 08, 2006 - 7:16 am
I am using Mplus to analyze multilevel data (n=671, with 30 groups). The design effect for the dependent variable of interest is less than 2. Two of the independent variables in the model are measured at the group level. Will I get unbiased parameter estimates if I use OLS regression to test the model (even if it includes level 2 predictors)?
I was also checking whether multilevel analyses are really necessary for my regression model in Mplus and tried to calculate the design effect in addition to the ICC to make my decision. However, I did these calculations for the full model and for the unconditional model (without predictors) and there was a difference between these two:
model with predictors: Deff = 1.32, ICC =.02 model without predictors: Deff = 2.23, ICC = .06
Probably this can be explained by the fact that I have missing data and that the average cluster size differs across both models, but to which value would you attach most importance? Or which Deff is usually chosen? The one from the uncond model or the one from the full model?
My colleague just made my realize that the drop in ICC and Deff in the model with predictors might also be due to the addition of a group-level predictor that might explain lots of the between-group variance, so little is left for the ICC.
However, the effect of different cluster sizes is still possible (average size for uncond model = 21; average size for mod with pred = 18).
Then I thought that I should alter the syntax of my conditional model so that the same N is used for both models and then see if there still was a difference between this new model and the old model with predictors, but I do not know if this is correct?
New model ANALYSIS: type= twolevel; estimator =MLR; iterations = 1000; model: %between% zaca on zscho@0; %within% zaca on sex@0zach@0;
This new model then resulted in the same ICC and Deff as the model with predictors, so maybe there was just a false effect caused by different cluster sizes?
Thank you for your answer, Linda. The model with or without type=complex nearly made a difference. To be sure that I understand you correctly: should I then just report the ICC from the model with predictors as a reference not to use multilevel modeling in combination with that Deff in my paper? =.02
And I do not need to mention the ICC from the "true" null model with the different N (due to the use of missing data imputation, type=missing)? =.06
Would you also report the .02 ICC when justifying the choice for not using multilevel modeling? Because many textbooks refer to the ICC from the unconditional model to decide whether you should use multilevel analyses or not and I'm a bit unsure whether the model command below is in fact the representation of the right null model...
This because I cannot use the syntax for the actual empty model with only my dependent variable, because of my missing data that cause a difference in N's, which is not a problem for the model command mentioned above.
You should not use the input you show because if the covariates are in fact related to the dependent variable the model is misspecified and any results from it are distorted.
The size of the intraclass correlation (ICC) should not be the deciding factor for using multilevel modeling or not - it depends on the combination (approximately the product) of cluster size and ICC. See our course handout for "Topic 5".
If you have clustered data and a regression model you can do multilevel modeling or only correct SEs (type = complex). You learn more doing the former because there are more parameters. But perhaps you are asking when do you not have to do either, but can assume simple random sampling?
I don't understand how you calculate DEFF. And I not sure how you compute ICC when having covariates in the model (and their slopes not fixed at zero) - are you working off the residual variance on between and within? That ICC will typically be lower because the covariates account for some of the clustering.
Thank you for your quick response. I was indeed thinking this was not the unconditional model I needed to check for ICC. However, how can I calculate the right ICC and Deff? Because I did formulate an empty model with only my dependent variable, as was suggested on the discussion forum; but because of missing variables the N (and also the average cluster size reported in the output) of this analysis and my final analysis differ and I suppose this is problematic? The formula I used for the Deff was 1 + (average cluster size in output - 1)* ICC. In the model with predictors I relied on the intraclass correlation given in the output of MPlus, but maybe this is wrong too and I should be calculating the ICC by hand with the residual variances? Also, I've been asking myself if my cluster sizes (about 18-20 depending on missings), regardless of any differences across analyses or not, are in fact too low for multilevel modeling and the complex-function on its own. When comparing the complex-analysis for example with a regular random sampling, I indeed find nearly similar models.
I would use the ICC you get from Type = Basic. If you include not only the dependent variable but also other variables you should get the sample size you want.
Note that the DEFF formula is only exact for estimating means in clusters of equal size - for anything else it is just an approximation.
18-20 clusters is very much on the low end for Type=Twolevel and Type=Complex.
Finding similar SEs with Type=Complex and regular analysis says that you don't have to worry about the non-independence when it comes to SEs. You can still try twolevel modeling and see if say the random intercept has a significant variation over clusters, in which case you may want to express the random intercept variation in terms of cluster-level covariates.
Thank you for your clear answer. I'm thinking to not use any multilevel option for the moment. I did however check the ICC with the type=basic option and then found the same ICC as reported in my full multilevel model with covariates and the model with covariates@0. So maybe I'm still doing something wrong (because I still had to add covariates with the basic option to obtain the right N), or is this the true ICC?
this is the syntax I used:
USEVAR ARE aca SEX KACH KASTU KASCHO; MISSING IS ALL (-9999); cluster = scho; WITHIN = SEX KACH KASTU; BETWEEN = KASCHO;
I first left out the within and between statements but then I received a warning: A MATRIX COULD NOT BE INVERTED DURING THE H1 MODEL ESTIMATION. THE ESTIMATED WITHIN COVARIANCE MATRIX IS NOT POSITIVE DEFINITE AS IT SHOULD BE. COMPUTATION COULD NOT BE COMPLETED. THE VARIANCE OF KASCHO APPROACHES 0. FIX THIS VARIANCE AND THE CORRESPONDING COVARIANCES TO 0, DECREASE THE MINIMUM VARIANCE, OR SPECIFY THE VARIABLE AS A BETWEEN VARIABLE.
THE H1 MODEL ESTIMATION DID NOT CONVERGE. SAMPLE STATISTICS COULD NOT BE COMPUTED.
Variables for which you want ICCs should not be put on the Within list because that says that they are not allowed to have Between-level variation. If KASCHO is a Between-level variable it needs to be declared as such, saying that it does not have Within-level variation.