Multilevel model testing
Message/Author
 Joanna Harma posted on Thursday, October 25, 2007 - 9:28 am
Dear Linda,
I am doing two level logistic regression with children characteristics on lower level and family characteristics on upper level. I am trying to establish what determines the type of school child goes to. Dependent variable is binary and explanatory variables are mix of categorical and continuous.
Is there any way of testing my model fitness?
How can I obtain residuals, y-yhat?
Can you suggest any possbile graphs for such analysis to test model fitness?
Regards
Joanna
 Joanna Harma posted on Friday, October 26, 2007 - 6:17 am
Hello,
Can anyone explain the significance of factor score (Empirical Bayes residuals) obtained during multilevel logistic regression and how it can be used to estimate predicted probabilities?
Thanks
 Bengt O. Muthen posted on Friday, October 26, 2007 - 10:08 am
The factor score is an estimate of the random intercept for each cluster. To estimate the overall probability you need numerical integration over the random intercept distribution. To estimate a probability at a certain value of the random intercept distribution, such as the estimated value for a certain cluster, you simply translate the resulting logit into probability by the usual formula

Prob = 1/(1+exp(-Logit))

where the Logit contains the factor score estimate of the cluster random intercept plus the usual level 1 part of the Logit (including the intercept/threshold).
 Joanna Harma posted on Monday, October 29, 2007 - 7:56 am
I estimated probabilites for level 1 observations using level 2 residuals (factor scores) added to usual level one equation. However when I did normality test for residuals, it plotted long tailed curve. Any suggestions how to deal with it when most of my variables are categorical.
Also the output files gives R-square for level 1 & 2 when standardized option is used in the output. Can this be used reliably for model fitness?
Thanks
Joanna
 Bengt O. Muthen posted on Tuesday, October 30, 2007 - 8:02 am
I think you are saying that the random intercept in the 2-level logistic regression has an estimated distribution that looks non-normal with long tails. In the forthcoming Chapman-Hall chapter by Muthen & Asparouhov on our web site the end of the paper discusses the sensitivity to violations of normality for random effects and how one can instead use a non-parametric approach to estimating the random effect distribution. This is in the related case of growth modeling with binary outcomes using a logistic link. The sensitivity to the normality assumption of random effects has also been studied by Charles McCulloch at UC San Francisco as indicated by his recent talk here in LA.
 Bengt O. Muthen posted on Tuesday, October 30, 2007 - 8:06 am
P.S. I don't think R-square is useful for indicating model fit with logistic regression.
 Joanna Harma posted on Tuesday, October 30, 2007 - 8:34 am
Thanks
I started modelling with adding level 2 variables as I am more interested in the effect of family characteristics, on the school type child goes to, while controlling the effect of child level variables. I added child level variable after I had added all the relevant family variables. It was surprising to see that the residual variance (level 2) shot up from 10 to 34 and few of level 2 variables became insignificant. Is this normal? Also, most of my level two variables have high S.E. and t-value around 2.5.
My data has 250 families and these families have 475 children. Thus the size of cluster is not very big. Do you think I could do logistic regression with this kind of data?
Thanks
 Bengt O. Muthen posted on Wednesday, October 31, 2007 - 8:32 am
I think this residual variance behavior has been described in the multilevel literature - see for instance the Snijders & Bosker book. - Any lit. suggestions from other Mplus Discussion readers?

With an average of 2 children per family you don't have many subjects for within-level parameter estimation. But if you for example do 2-level logistic regression you don't have any within-level parameters (not even a level-1 residual variance). 2-level factor analysis, however, would be problematic.
 Joanna Harma posted on Wednesday, October 31, 2007 - 9:10 am
Thanks,
I have 250 families of which 120 send all their children to government schools, 110 send their children to private school and only 20 families send their children to both type of children(i.e. 1 child to government and 1 to private). Now I have school type child goes to as dependent variable. So lower level variable relate to child characteristic and upper level relate to family characteristics. This data structure to me speaks not much within level variance and high between level variance. In intercept model I get between variance as 30. Could this be a problem? Also, could the fact that I have not much variability at child level be a problem? Can I still use 2 level logistic regression?
Thanks
 Joanna Harma posted on Wednesday, October 31, 2007 - 9:14 am
I forgot to ask. Since my cluster size is relatively small, could I ignore two level structure and do ordinary logistic regression with both child level and family level variable?
Thanks
 Bengt O. Muthen posted on Wednesday, October 31, 2007 - 9:29 am
Given that only 20 families have within-family variation on your binary dependent variable, the binary dependent variable is almost a level 2 outcome. This may be making your analysis more difficult.

As for your second question, instead of Type = Twolevel, you could use Type = Complex which simply corrects SEs for the clustering of children within families (not estimating a level 1 and level 2 model). That is an easier analysis.
 Joanna Harma posted on Friday, November 02, 2007 - 6:52 am
Thanks Bengt,
Another question. Is normality and homoscedasticity an assumption for level 2 residual in 2 level logistic regression?
Thanks
Joanna
 Bengt O. Muthen posted on Friday, November 02, 2007 - 7:03 am
Yes, that is a standard assumption.