Message/Author 

Lily Wang posted on Sunday, February 05, 2012  12:23 pm



Hi, Drs Muthens, I encounter a problem (accidental termination, to be specific) when trying to impute the data. The data is a national data (TYPE=COMPLEX). I wrote: DATA IMPUTATION: impute=V1 V2 V3; save=CCimpute*.dat; Mplus terminates when doing second imputation (as shown in the DOS window) without any of further notice or error message. The output window does not pop up as usual. If I open the output file manually, the file only contains the syntax I wrote. Is there anyway to fix the problem? 


If you are not using Version 6.12, you should download it. If you are, send the files and your license number to support@statmodel.com. 

Stata posted on Thursday, February 09, 2012  11:34 pm



Dear Muthens, I am trying to impute (MI) with a national dataset for multilevel latent class analysis: Format is (F4.0, F3.0, 51F1.0); VARIABLE:NAMES ARE SID STRAT BB1BB51; USEVARIABLES = BB1BB51; AUXILIARY = SID STRAT; CATEGORICAL ARE BB1BB51; Missing = ALL (9); Data Imputation: IMPUTE = BB1BB51(c); NDATASETS = 40; ANALYSIS: TYPE = BASIC; The last 4 variables were dichotomous, the rest of them were in 4point scale I was not able to use the imputed data to ran multilevel. This is what I got: *** ERROR(Err#: 64) Invalid symbol at record #: 1 The record is shown below this message "diss.imp1.dat" Therefore, I ran two level MI adding the following syntax: cluster = SID; type=BASIC Twolevel; *** FATAL ERROR THE CONVERGENCE CRITERION IS NOT SATISFIED. INCREASE THE MAXIMUM NUMBER OF ITERATIONS OR INCREASE THE CONVERGENCE CRITERIon How can I fix these problems? Thank you. 


Please send the relevant files and your license number to support@statmodel.com. 

FN briere posted on Wednesday, March 28, 2012  12:56 pm



Hi, two questions regarding twolevel MI: 1) I would like to make sure that the following syntax is appropriate to generate a twolevel H1 (unrestricted) imputation. Usevariables = Alevel1 Blevel1 Clevel1 Alevel2 ; cluster = SCHOOL; within = Alevel1 Blevel1 Clevel1; between = Alevel2 ; MISSING are all (100); DATA IMPUTATION: IMPUTE = Alevel1 Blevel1 Clevel1 Alevel2; NDATASETS = 5; SAVE = twolevel*.dat; ANALYSIS: TYPE = twolevel; 2) Would it be preferable not to specify the outcome to be used in a twolevel regression model (second step after imputation) as either within or between? Thank you in advance, Frédéric 


1) You want to say Type = Twolevel Basic; And you want to remove the Within= line since that would misspecify the variables as not having any betweenlevel variance. 2) You should only use Between = Alevel2; during imputation and estimation phases. 

FN briere posted on Friday, March 30, 2012  10:06 am



Thank you, this is very useful. I gather from a different post that to for models with random slopes and crosslevel interactions, it is necessary to switch to a H0 approach. I wish to do something as close as possible to H1, but including random slopes. I would tend to do this by specifying a H0 imputation model with random slopes and all correlations between variables at the two levels. e.g. Usevariables = x1 x2 y1 Alevel2 ; cluster = SCHOOL; between = Alevel2 ; MISSING are all (100); ANALYSIS: TYPE = TWOLEVEL; ESTIMATOR = BAYES; MOdel: %WITHIN% s  y1 on x1; y1 with x2; x1 with x2; %BETWEEN% s y1 x2 x1 Alevel2 with s y1 x2 x1 Alevel2; DATA IMPUTATION: IMPUTE = x1 x2 y1 Alevel2; NDATASETS = 5; SAVE = twolevel*.dat; Does that seem like a correct approach? Thank you for your time again, 


I think that's ok. It sounds like your primary interest is in getting imputed data for some later investigation, not estimating the model parmeters. You can estimate the model parameters without imputing. Assuming imputing is the primary interest, in general it may be a good idea to impute from a model that is as close to the "true" model as possible. With twolevel settings, however, it can be difficult to get convergence with a very unrestricted model (a model close to H1), mainly due to having many betweenlevel parameters. That's why our UG gives an example of imputing from a simpler model than the later analysis model. How far apart these two models can be is an interesting research question. 

FN briere posted on Saturday, March 31, 2012  11:39 am



Thanks, always very useful. One last question which I think may also benefit others. Given that my main interest is more in specific crosslevel interactions than in random effects, another option may be to run an H1 imputation including crosslevel interaction terms. I tried that and results from models estimated on imputed files (second step) look fine. I am thinking to go this way, as I did get some convergence problems with H0 models with random effects. Any thoughts on this strategy? 


I don't see the distinction between crosslevel interactions and random effects. Your model above would have a crosslevel interaction if you regress the random slope s on Alevel2. 

FN briere posted on Saturday, March 31, 2012  12:59 pm



Yes, I wasn't clear. I am wondering about different strategies to obtain something similar: defining a crosslevel interaction term as a variable to be included in an H1 imputation v. specifying a random slope s to be regressed on Alevel2 in a H0 model. I get no convergence problems with the first strategy, but I do get some with the second. This is why I ask. 


I see, you just enter a variable that is the product of the two variables. That sounds reasonable. It is an interesting general question of how close to the "true" model the imputation model has to be  how much difference it makes in the quality of the imputed data. A research topic. 


Dear Dr. Muthen i am doing multilevel SEM using multiple imputation and incorporating the complex survey design. The data comes from Add health and my subpopulation for the complex survey analysis are students attending 9th to 11th grades in wave 1. My outcome is a factor composed by GPA in math, reading, science and social studies. I performed multiple imputation to deal with missing data however, i still have some missing data because students did not take one or more of the four courses during the school year. I am not sure how Mplus deals with this valid missing cases. I am considering out of my subpopulation all cases that have missing in all four course, but i would like to keep cases that have taken at least one course. Would it be ok to have those cases just specifying the value for missing (for example, missing are 9999) or does mplus expect nonmissing for all variables in each imputed data set? (in this case, how do i deal with valid missing cases) thank you Fernando 


You can have missing data in the imputed data sets. This missing data is treated in the regular way. 


that is nice!, thank you fernando 


It sounds like the above case presented by Fernando involves variables that have missing values because no values were imputed for those variable. Could we instead have variables that have values imputed for some cases but left missing for other cases? The example I'm thinking of is a survey with a skip pattern such that there are items that only a subset of respondents answer. Let's say 60 out of 100 of respondents are presented with an item but only 50 answer them. I would want to impute for those 10 who were presented the item but elected not to respond. Is this possible? Perhaps through some kind of "if" statement associated with DATA IMPUTATION? 


You can impute the data but not use the imputed values for the subgroup. Some kind of code in the analysis part of the imputed data should work (if 9999 is the missing value and you have both the imputed Yimp and the nonimputed Y in the same file): define: if (group==1 .and. Y=9999) then Yimp=_missing; This will restore the missing value where you need it. 


Perfect. Thank you, Tihomir! 


Dear Dr. Muthen Hope you are doing smoothly in fall semester. Can I ask you for your advice on how to impute cluster variable using multiple imputation(TYPE=COMPLEX)? Here is my syntax: VARIABLE:NAMES = econpr2 invpar latediv tr addep3 parrej delq bmi3 ill3 biosex4 hos smoking alcohol eat ill4d psu never ed; Missing are all(999); DATA IMPUTATION: IMPUTE = econpr2 invpar latediv (c) tr (c) addep3 parrej delq bmi3 ill3 biosex4 (c) hos smoking alcohol eat ill4d psu never (c) ed (c); NDATASETS = 10; SAVE = illimp*.dat; ANALYSIS: TYPE = BASIC; OUTPUT: TECH8; With this syntax, I was able to get perfect dataset without any missingness. However, finally, I found a problem because psu is cluster variable(school ID). With above syntax, program treat that variableas continuous variable. Although this variable has quite huge range(1371), this is not a continuous variable. So, I am wondering whether there's certain way that I can treat this variable as cluster variable within the context of Multiple imputation. 


Are you trying to impute for missing values on school ID? 


Yes. is it possible to impute for missing values on school ID? 


This does not make sense to me. It should be your analysis variables you impute for. 


Dear Dr. Muthén, I have several questions regarding MI for multilevel models. 1.) I want to include cluster means as predictors in the analysis. If I use a H1 imputation and do not specify (level 1) variables as within, then it is not necesarry to include cluster means (of the level 1 variables) on the between level? Is this correct? 2.) I have 3level data (pupils, classes, schools). There is no missing data for class and school level variables. I want to estimate the effect of a 0/1 coded treatment variable at class level on performance. Additionally I'm interested in cross level interactions (effect of school level variables on the effect of treatment on performance). 2a.) I tried type = threelevel and estimator = bayes, but the model does not converge. Would it be adequate to use type = basic twolevel (cluster = class) in order to achieve convergence? (there are no problems for type = basic and type = basic twolevel). 2b.) Due to convergence problems it seems also not possible to include the crosslevel interactions. Would it be adequate for H1imputation to include simple product terms (VariableLev3xVariableLev2, ...)? Thanks a lot! Christoph Weber 


The WITHIN and BETWEEN options should be used if appropriate when the data are imputed. Please send the threelevel output with convergence problems and your license number to support@statmodel.com. 

Back to top 