Message/Author 


Dear Linda, I am doing two level logistic regression with children characteristics on lower level and family characteristics on upper level. I am trying to establish what determines the type of school child goes to. Dependent variable is binary and explanatory variables are mix of categorical and continuous. Is there any way of testing my model fitness? How can I obtain residuals, yyhat? Can you suggest any possbile graphs for such analysis to test model fitness? Regards Joanna 


Hello, Can anyone explain the significance of factor score (Empirical Bayes residuals) obtained during multilevel logistic regression and how it can be used to estimate predicted probabilities? Thanks 


The factor score is an estimate of the random intercept for each cluster. To estimate the overall probability you need numerical integration over the random intercept distribution. To estimate a probability at a certain value of the random intercept distribution, such as the estimated value for a certain cluster, you simply translate the resulting logit into probability by the usual formula Prob = 1/(1+exp(Logit)) where the Logit contains the factor score estimate of the cluster random intercept plus the usual level 1 part of the Logit (including the intercept/threshold). 


I estimated probabilites for level 1 observations using level 2 residuals (factor scores) added to usual level one equation. However when I did normality test for residuals, it plotted long tailed curve. Any suggestions how to deal with it when most of my variables are categorical. Also the output files gives Rsquare for level 1 & 2 when standardized option is used in the output. Can this be used reliably for model fitness? Thanks Joanna 


I think you are saying that the random intercept in the 2level logistic regression has an estimated distribution that looks nonnormal with long tails. In the forthcoming ChapmanHall chapter by Muthen & Asparouhov on our web site the end of the paper discusses the sensitivity to violations of normality for random effects and how one can instead use a nonparametric approach to estimating the random effect distribution. This is in the related case of growth modeling with binary outcomes using a logistic link. The sensitivity to the normality assumption of random effects has also been studied by Charles McCulloch at UC San Francisco as indicated by his recent talk here in LA. 


P.S. I don't think Rsquare is useful for indicating model fit with logistic regression. 


Thanks I started modelling with adding level 2 variables as I am more interested in the effect of family characteristics, on the school type child goes to, while controlling the effect of child level variables. I added child level variable after I had added all the relevant family variables. It was surprising to see that the residual variance (level 2) shot up from 10 to 34 and few of level 2 variables became insignificant. Is this normal? Also, most of my level two variables have high S.E. and tvalue around 2.5. My data has 250 families and these families have 475 children. Thus the size of cluster is not very big. Do you think I could do logistic regression with this kind of data? Thanks 


I think this residual variance behavior has been described in the multilevel literature  see for instance the Snijders & Bosker book.  Any lit. suggestions from other Mplus Discussion readers? With an average of 2 children per family you don't have many subjects for withinlevel parameter estimation. But if you for example do 2level logistic regression you don't have any withinlevel parameters (not even a level1 residual variance). 2level factor analysis, however, would be problematic. 

Joanna Harma posted on Wednesday, October 31, 2007  9:10 am



Thanks, I have 250 families of which 120 send all their children to government schools, 110 send their children to private school and only 20 families send their children to both type of children(i.e. 1 child to government and 1 to private). Now I have school type child goes to as dependent variable. So lower level variable relate to child characteristic and upper level relate to family characteristics. This data structure to me speaks not much within level variance and high between level variance. In intercept model I get between variance as 30. Could this be a problem? Also, could the fact that I have not much variability at child level be a problem? Can I still use 2 level logistic regression? Thanks 

Joanna Harma posted on Wednesday, October 31, 2007  9:14 am



I forgot to ask. Since my cluster size is relatively small, could I ignore two level structure and do ordinary logistic regression with both child level and family level variable? Thanks 


Given that only 20 families have withinfamily variation on your binary dependent variable, the binary dependent variable is almost a level 2 outcome. This may be making your analysis more difficult. As for your second question, instead of Type = Twolevel, you could use Type = Complex which simply corrects SEs for the clustering of children within families (not estimating a level 1 and level 2 model). That is an easier analysis. 


Thanks Bengt, Another question. Is normality and homoscedasticity an assumption for level 2 residual in 2 level logistic regression? Thanks Joanna 


Yes, that is a standard assumption. 

Back to top 