Message/Author 


Hi! I'm trying to conduct multiple imputation in a multilevel dataset and I have some questions: 1) Is it ok to use latent aggregation for a variable that is going to be imputed? I'm worried that the small sizes of the clusters will affect the results. We have 2 to 4 subjects but sampling is close to 100% of the possible subjects. 2) Do you have data comparing the 3 kinds of H1 imputation? Many thanks! 


1) I am guessing, but it sounds like you are asking about H0based imputation using a twolevel model and that when data have been so imputed you want to do twolevel modeling using the latent covariate approach. And then you worry that you have too small cluster sizes. I couldn't say what the effects of that would be for different degrees of missingness without doing a simulation study (if that indeed was what you were asking). 2) We haven't done organized simulation studies on that, but I assume the results are quite similar. A simulation study might be of interest. 


For 1 I was actually thinking about a H0 model in which some variables that are going to be imputed are not stated as within or between. 


I see, so you are estimating both a within and between component of those variables. Although you have small cluster sizes you may get ok results if you have many clusters. But how all this affects the quality of the imputation is a research topic. 


I am assuming that H1 models for multilevel data include only fixed effects, is that correct? Would it be possible to add a H1 model with random effects? I am getting results which differ a bit too much from the imputed data in realcomimpute when analyzing randomeffects in MlwiN. See this paper for a discussion on that issue : http://onlinelibrary.wiley.com/doi/10.1002/bimj.201000140/full 


Mplus does twolevel H1 imputation. That means that you allow for random effects, that is variables can vary across the units of both levels. 


So I found that Mplus gives me out of range values whenever I use twolevel or complex, even after telling that the values should be in a certain range. I'm thinking that it might be an issue with the latent aggregation because our clusters are small in size (24) and there's little to no sampling error but I can't say for sure. Would it be possible to add a command to do manifest aggregation? 


Please send the relevant files and your license number to support@statmodel.com. 


Hi, I'm doing twolevel analysis with random slopes and would like to use multiple imputation. 1) Do I assume correctly that, at least for continuous data, variance covariance imputation in Mplus is similar to Schafer and Yucel's (2002) approach with all variables treated as dependent? 2) Is H1 imputation equal to H0 imputation with saturated models at both levels? 3) If yes, is it correct that H1 imputation is not adequate for models with random slopes, as all relationships between variables are taken into account but not possible variation in these relationships between groups? My analysis model is of this type: %within% s1  y1 on x1; s2  y2 on x1; y1 y2 on x2; y1 with y2; %between% y1 y2 s1 s2 on x2 z; y1 y2 s1 s2 with y1 y2 s1 s2; There is missing data on y1, y2, x2, and z (not on x1). I came up with this model to impute the missing data on these variables: %within% s1  y1 on x1; s2  y2 on x1; x2 on x1; y1 y2 x2 with y1 y2 x2; %between% y1 y2 s1 s2 x2 z with y1 y2 s1 s2 x2 z; 4) Most important to me are correct estimates of random slope variances and crosslevel interactions. Is this a suitable imputation model then? 5) Plausible value estimation is currently not possible with logit link, right? (This would be great to have!) Many thanks! 


1) Schafer and Yucel's (2002) use random slope in the imputation model. It can be done in Mplus with H0 imputation. 2) Yes 3) Yes 4) Yes 5) You can use Probit link  for imputation purposes it should be sufficient. 

Emily Kim posted on Tuesday, April 03, 2012  1:05 pm



Hi! I'm trying to use Mplus for analyzing twolevel data with missing data imputation. I need to impute the missing data for predictors at level1 as well as level2. I used syntax below to impute the data: ANALYSIS: estimator = bayes; type = basic twolevel; bseed = 72114; bconvergence = .01; DATA IMPUTATION: impute = math_ss math08ss urban CEO; ndatasets = 20; save = CEO_probsolveimp*.dat; thin = 1000; Although CEO is a level2 predictor, it varied at level1, so that I couldn't put it at level2 for the analysis. Am I doing something wrong? Can I handle missing data for a predictor at level2? Thanks in advance! 


A level2 predictor by definition does not vary within level1 clusters. These variables should be put on the BETWEEN list. 

Emily Kim posted on Wednesday, April 04, 2012  8:11 am



Thanks, Linda. Yes, they shouldn't vary within level1 clusters, but they did. Is there a way to specify whether a variable is at level1 or level2 for imputing process? In the syntax above, only CEO is a level2 variable while all others (e.g., math_ss) are level1 variables, but they all are in the same statement regardless of level difference. Please let me know if I can clarify more on this. 


See the WITHIN and BETWEEN options in the user's guide. A variable cannot be put on the BETWEEN list if it varies for individuals in the same cluster. It sounds like you have a problem with your data that needs to be addressed. 

Emily Kim posted on Wednesday, April 04, 2012  11:44 am



Thanks again, Linda. I greatly appreciate your feedback. Please let me clarify my question. I conduct multilevel analysis with twolevel data. Model is below: Level1: Achievement = b0j + b1jGender_ij + b2jPretest_ij + rij Level2: b0j = r00 + r01SchoolSize_oj + u0j I have missing data with Pretest and SchoolSize, so I'm trying to impute the missing data for those two predictors. I used Mplus for missing data imputation with twolevel data. Syntax for missing data imputation procedure was below: DATA IMPUTATION: impute = pretest schoolsize; ndatasets = 20; save = probsolveimp*.dat; thin = 1000; In the 20 sets of imputed data from Mplus, I got level1 variations for schoolsize, which should not be. I assumed that it happened because I did not specify the level difference in the imputation statement. My question is this: Can I specify that Schoolsize is level2 predictor so that I do not get the level1 variance for that predictor? I'm sorry for any confusion in previous questions. Thank you! 


If you did not put schoolsize on the BETWEEN list, it would not be imputed as a between variable. 

Back to top 