Mplus Discussion >> Multilevel imputation

Topics
Last Day
Last 3 Days
Last Week
Tree View

Edit Profile


Multilevel imputation

Mplus Discussion > Missing Data Modeling >

Message/Author

Andre Plamondon posted on Tuesday, June 14, 2011 - 12:58 pm

Hi!
I'm trying to conduct multiple imputation in a multilevel dataset and I have some questions:
1) Is it ok to use latent aggregation for a variable that is going to be imputed? I'm worried that the small sizes of the clusters will affect the results. We have 2 to 4 subjects but sampling is close to 100% of the possible subjects.
2) Do you have data comparing the 3 kinds of H1 imputation?

Many thanks!

Bengt O. Muthen posted on Tuesday, June 14, 2011 - 5:50 pm

1) I am guessing, but it sounds like you are asking about H0-based imputation using a twolevel model and that when data have been so imputed you want to do twolevel modeling using the latent covariate approach. And then you worry that you have too small cluster sizes. I couldn't say what the effects of that would be for different degrees of missingness without doing a simulation study (if that indeed was what you were asking).

2) We haven't done organized simulation studies on that, but I assume the results are quite similar. A simulation study might be of interest.

Andre Plamondon posted on Tuesday, June 14, 2011 - 6:35 pm

For 1 I was actually thinking about a H0 model in which some variables that are going to be imputed are not stated as within or between.

Bengt O. Muthen posted on Wednesday, June 15, 2011 - 5:09 pm

I see, so you are estimating both a within and between component of those variables. Although you have small cluster sizes you may get ok results if you have many clusters. But how all this affects the quality of the imputation is a research topic.

Andre Plamondon posted on Monday, June 20, 2011 - 9:22 am

I am assuming that H1 models for multilevel data include only fixed effects, is that correct? Would it be possible to add a H1 model with random effects? I am getting results which differ a bit too much from the imputed data in realcom-impute when analyzing random-effects in MlwiN.

See this paper for a discussion on that issue :
http://onlinelibrary.wiley.com/doi/10.1002/bimj.201000140/full

Bengt O. Muthen posted on Monday, June 20, 2011 - 10:24 am

Mplus does twolevel H1 imputation. That means that you allow for random effects, that is variables can vary across the units of both levels.

Andre Plamondon posted on Sunday, July 24, 2011 - 7:59 am

So I found that Mplus gives me out of range values whenever I use twolevel or complex, even after telling that the values should be in a certain range. I'm thinking that it might be an issue with the latent aggregation because our clusters are small in size (2-4) and there's little to no sampling error but I can't say for sure. Would it be possible to add a command to do manifest aggregation?

Linda K. Muthen posted on Sunday, July 24, 2011 - 11:57 am

Please send the relevant files and your license number to support@statmodel.com.

Jan Hochweber posted on Friday, October 21, 2011 - 2:21 am

Hi, I'm doing twolevel analysis with random slopes and would like to use multiple imputation.

1) Do I assume correctly that, at least for continuous data, variance covariance imputation in Mplus is similar to Schafer and Yucel's (2002) approach with all variables treated as dependent?

2) Is H1 imputation equal to H0 imputation with saturated models at both levels?

3) If yes, is it correct that H1 imputation is not adequate for models with random slopes, as all relationships between variables are taken into account but not possible variation in these relationships between groups?

My analysis model is of this type:

%within%
s1 | y1 on x1;
s2 | y2 on x1;
y1 y2 on x2;
y1 with y2;

%between%
y1 y2 s1 s2 on x2 z;
y1 y2 s1 s2 with y1 y2 s1 s2;

There is missing data on y1, y2, x2, and z (not on x1).
I came up with this model to impute the missing data on these variables:

%within%
s1 | y1 on x1;
s2 | y2 on x1;
x2 on x1;
y1 y2 x2 with y1 y2 x2;

%between%
y1 y2 s1 s2 x2 z with y1 y2 s1 s2 x2 z;

4) Most important to me are correct estimates of random slope variances and cross-level interactions. Is this a suitable imputation model then?

5) Plausible value estimation is currently not possible with logit link, right? (This would be great to have!)

Many thanks!

Tihomir Asparouhov posted on Friday, October 21, 2011 - 12:43 pm

1) Schafer and Yucel's (2002) use random slope in the imputation model. It can be done in Mplus with H0 imputation.

2) Yes

3) Yes

4) Yes

5) You can use Probit link - for imputation purposes it should be sufficient.

Emily Kim posted on Tuesday, April 03, 2012 - 1:05 pm

Hi!
I'm trying to use Mplus for analyzing two-level data with missing data imputation. I need to impute the missing data for predictors at level-1 as well as level-2. I used syntax below to impute the data:

ANALYSIS:
estimator = bayes;
type = basic twolevel;
bseed = 72114;
bconvergence = .01;

DATA IMPUTATION:
impute = math_ss math08ss urban CEO;
ndatasets = 20;
save = CEO_probsolveimp*.dat;
thin = 1000;

Although CEO is a level-2 predictor, it varied at level-1, so that I couldn't put it at level-2 for the analysis. Am I doing something wrong? Can I handle missing data for a predictor at level-2? Thanks in advance!

Linda K. Muthen posted on Tuesday, April 03, 2012 - 4:16 pm

A level-2 predictor by definition does not vary within level-1 clusters. These variables should be put on the BETWEEN list.

Emily Kim posted on Wednesday, April 04, 2012 - 8:11 am

Thanks, Linda. Yes, they shouldn't vary within level-1 clusters, but they did. Is there a way to specify whether a variable is at level-1 or level-2 for imputing process? In the syntax above, only CEO is a level-2 variable while all others (e.g., math_ss) are level-1 variables, but they all are in the same statement regardless of level difference. Please let me know if I can clarify more on this.

Linda K. Muthen posted on Wednesday, April 04, 2012 - 9:12 am

See the WITHIN and BETWEEN options in the user's guide. A variable cannot be put on the BETWEEN list if it varies for individuals in the same cluster. It sounds like you have a problem with your data that needs to be addressed.

Emily Kim posted on Wednesday, April 04, 2012 - 11:44 am

Thanks again, Linda. I greatly appreciate your feedback.

Please let me clarify my question. I conduct multilevel analysis with two-level data. Model is below:

Level-1:
Achievement = b0j + b1jGender_ij + b2jPretest_ij + rij

Level-2:
b0j = r00 + r01SchoolSize_oj + u0j

I have missing data with Pretest and SchoolSize, so I'm trying to impute the missing data for those two predictors. I used Mplus for missing data imputation with two-level data. Syntax for missing data imputation procedure was below:

DATA IMPUTATION:
impute = pretest schoolsize;
ndatasets = 20;
save = probsolveimp*.dat;
thin = 1000;

In the 20 sets of imputed data from Mplus, I got level-1 variations for schoolsize, which should not be. I assumed that it happened because I did not specify the level difference in the imputation statement. My question is this: Can I specify that Schoolsize is level-2 predictor so that I do not get the level-1 variance for that predictor? I'm sorry for any confusion in previous questions. Thank you!

Linda K. Muthen posted on Wednesday, April 04, 2012 - 1:47 pm

If you did not put schoolsize on the BETWEEN list, it would not be imputed as a between variable.

Christoph Weber posted on Monday, February 01, 2016 - 8:58 am

dear mplus team,
I'm using pisa data and want to estimate a threelevel (school, country) model including cross-level interactions. The main focus is to explain the variation of a L1-relationship through country-level variables, but I would also control for L2-variables.

I want to do MI for each country seperatly to preserve slope variation between countries. Further I think that it is necesarry to preserve between school slope variation, thus I would use H0-imputation for each country.

Is this a reasonably strategy?

Further, the dependent variables are plausible values!

How should I treat PVs in the imputation model?
Is it reasonable to run 5 imputation models? For imputation model 1 I would use PV1, for model 2 PV2 and so on...
From imputation model 1 I would use data set 1 in the final analysis, from imputation model 2 I would use data set 2 and so on....

Thanks a lot!
christoph

Bengt O. Muthen posted on Monday, February 01, 2016 - 6:20 pm

It seems dicey to do mult imp on top of mult imp (your PVs) - I don't know what the best research on this is.

I would try to handle the missingness by FIML or Bayes.

Christoph Weber posted on Tuesday, February 02, 2016 - 12:56 am

Thanks a lot,
is it possible to use bayes estimation for type imputation (because of the PVs)?
Do the rules for calculating SEs based on Mi-data also apply to posterior SDs?

If I use FIML for the Analysis of cross-level interaction, is it correct/sufficient to...

1.) use ALGORITHM=INTEGRATION and
INTEGRATION = MONTECARLO (following the suggestions in the UG (p. 473)?

2.) specify covariances of all Independent variables on each Level?

e.g.

%within%

PV ON X1 X2;
s | PV ON X3;
x1 x2 x3 with x1 x2 x3;

%between IDschool%
s ON y1 y2 y3;
PV ON y1 y2 y3;
s with PV;
y1 y2 y3 with y1 y2 �3;

%between IDcountry%

s ON z1 z2;
PV ON z1 z2;
s with PV;
z1 z2 with z1 z2;

Thanks

Christoph Weber posted on Tuesday, February 02, 2016 - 12:20 pm

Some further questions .
a.) for multilevel models with RS it is necesarry to use montecarlo Integration?
b.) But MC-intergration is not allowed with type = threelevel? Is there an alternative?
c.) for threelevel models using bayes estimation sampling weights are not allowed? --> Thus bayes esimation isn't possible

d.) would it be "correct" if I...
1.) estimate a twolevel model (Student Country) using fiml with MC-integration
2.) and back up the results with a threelevel model (using listwise deletion)?

Bengt O. Muthen posted on Tuesday, February 02, 2016 - 6:01 pm

First post:

Q1-Q2: I don't think there is theory for doing this.

1)-2): That should be fine. The number of integration points need to be increased with increasing dimensions of integration - see TECH8 (also watch out for negative ABS changes suggesting low precision).

Second post:

a) No, you can use regular ML integration.

b) See a).

c) Right - Bayes with weights has not been invented yet.

d1) - d2). That I couldn't say.

Christoph Weber posted on Wednesday, February 03, 2016 - 12:43 am

thanks a lot,
but there seems to be something wrong with my syntax (I tried a simple RS-model)

if I use Type = threelevel random and the following model command

%within%
PV1MATH ON migra2;
s | PV1math ON ESCS;!ESCS has missing values

ESCS with migra2 ;

%between IDNEU%
s with PV1math;

%between country%
s with PV1math;

I get the following error

*** FATAL ERROR
THIS MODEL ESTIMATION IS NOT AVAILABLE DUE TO MISSING DATA
IN A COVARIATE WITH RANDOM SLOPE.

What would be the input for regular ML integration?

thanks

Bengt O. Muthen posted on Thursday, February 04, 2016 - 7:04 pm

You cannot have missing on ESCS.

Christoph Weber posted on Friday, February 05, 2016 - 12:05 pm

Are there other alternatives to deal with the problem?

Bengt O. Muthen posted on Friday, February 05, 2016 - 6:01 pm

Delete missing data on ESCS or do multiple imputation of it.

Christoph Weber posted on Saturday, February 06, 2016 - 1:37 am

Thanks!

Christoph Weber posted on Saturday, February 06, 2016 - 5:57 am

one last question: I tried to use H0-imputation with a random slope (s | pv on escs) in the model command. But actually it seems that ESCS values are not imputed. Thus I specified the covariances of all l1-variables. Now I got the following error:

MODELS WITH RANDOM SLOPES FOR VARIABLES WITH MISSING VALUES
CAN NOT BE ESTIMATED WITH THE BAYES ESTIMATOR.

Does this mean, that it is not possible to impute values of Independent variables with random slopes?

Bengt O. Muthen posted on Saturday, February 06, 2016 - 4:55 pm

Right.

Christoph Weber posted on Tuesday, February 09, 2016 - 11:53 pm

a simple further question:
If I use a fixed seed and run the same imputation model two times, will the imputed data sets be identical? (it is: data set 1 first run = data set 1 2nd run, data set 2 first run = data set 2 2nd run, ....

thanks!

Bengt O. Muthen posted on Wednesday, February 10, 2016 - 1:22 pm

I think so.

Bonamy Oliver posted on Monday, September 05, 2016 - 1:21 am

Hi,

I am running a multilevel model and I need to impute the data. I've used:

TYPE IS TWOLEVEL BASIC;
ESTIMATOR IS BAYES;
...
DATA IMPUTATION:
IMPUTE=[VAR LIST];
NDATASETS=10;
SAVE=twolevel*.dat;

and then analysing the imputed data in with a script that uses:
DATA:
FILE IS twolevellist.dat;
TYPE=IMPUTATION;

However, I have also seen that it might be possible to do imputation and analyses all in one, putting the imputation command upfront (not saving the datasets) including the commands for the analyses. Is there one of these that's better than the other or should the results be the same?!

Many thanks!

Bonamy Oliver posted on Monday, September 05, 2016 - 9:38 am

I've sorted this now, many thanks!

Joshua Wilson posted on Monday, October 19, 2020 - 11:04 am

Dear Mplus team,

Can you direct me to examples - in the Users guide or elswewhere - of multilevel modeling using multiple imputation? I'm specifically interested in running two- and three-level multilevel regressions (students nested within classrooms nested within schools) and imputing data at Level 1 and Level 2. I'd like to learn more about how to do this in Mplus.

Thanks!

Bengt O. Muthen posted on Monday, October 19, 2020 - 4:54 pm

See UG ex 11.8.