Message/Author 


Version 2.02 allows missing data modeling when a latent mixture model is fit to data with a complex sampling design. Can missing data be handled with other models using complex data? I thought I read in the manual that it can't, but maybe the missing data feature has been added to other models in v. 2.02. 


No, missing cannot be handled by the regular TYPE=COMPLEX or TYPE=TWOLEVEL. However, you can use TYPE=MIXTURE COMPLEX MISSING with one class and thereby get missing for COMPLEX. 


Can multiple groups be analyzed using the TYPE=MIXTURE COMPLEX MISSING? 


For TYPE=MIXTURE, the training data feature can be used to define groups. 

Maggie posted on Friday, April 02, 2004  5:01 am



I am designing a twolevel SEM and I have much much missing data of the independent variables at within level. But there is no missing data at betweenlevel independent vairabels. Can this model still be handeld by Mplus version 3? or should I replace all of the missing data at within level before runing the program? Thanks for suggestion. 


This can be handled by Mplus Version 3. You should treat the x variables as y variables. The normality assumption changes from normality given x to overall normality. You change them to y's by mentioning their variances in the MODEL command. Then use TYPE = TWOLEVEL MISSING; 

Maggie posted on Monday, April 05, 2004  2:01 am



So you means that I can add the variances of withinlevel X variables into the MODEL command of within part. I.e.: %WITHIN% fw BY Y1Y3; fw ON X1; fw ON X2; fw x1x2; and for the between part of model,I needn't to add the variance of X variables (given that there is no missing data in between part). Am I correct? Thanks again. 


Yes. What you did is correct. 


We are running a structural equation model with clustered data (teenagers clustered within schools) using TYPE=COMPLEX modeling. We have missing data for some schools. Are there any special concerns we should consider when the missing data is at the second level, other than the usual things, like coverage and that missingness is at least MAR, when using TYPE=COMPLEX MISSING? 

bmuthen posted on Saturday, April 24, 2004  9:03 am



No special concerns, just the usual ones. 

Maggie posted on Tuesday, May 11, 2004  10:00 am



Questions again from a new user of Mplus 3. Could I return to the previous question posted on April 05? As you suggested, I add the variances of Xs in the MODEL command, but the output suggests me to use ALGORITHM = INTEGRATION; INTEGRATION= MONTECARLO in the ANALYSIS command. I refer to the example of MPLUS short curse: multilevel regression model,page 52, in the input command, there is no specification of variance of missing data and also no ALGORITHM command although in the VARIABLE, missing data is mentioned. So in general, in which situation should I add vaiances of missing Xs? In fact, after I add ALGORITHM = INTEGRATION; INTEGRATION= MONTECARLO into the analysis, no any output comes out, only shows that "INPUT READING TERMINATED NORMALLY" (I put output option as SAMPSTAT TECH8). In this case,is it necessary to run Monte Carlo simulation to generaing the missing data? If possible, could you please suggest me one complete example of Twolevel with Random and dealing with missing data? This perhaps can enable me ask you less questions concerning the similar issues. Thanks in advance for your kind response. 


We don't have examples that show MISSING. You just need to add it to the TYPE option of the ANALYSIS command. I suspect that your outcomes are not continuous and that is why numerical integration is required. Please send your output to support@statmodel.com if you want me to look at it. 

Mpduser1 posted on Wednesday, September 28, 2005  11:51 am



I'm building at multilevel SEM with two endogenous variables, Y1 and Y2, both of which are prone to missingness, and both of which have WITHIN and BETWEEN sources of variation. The missing data rate for Y2 is much higher than the missing data rate for Y1. Y2 is categorical, Y1 is ordinal. My question is this: Does Mplus 3.13 use information from both the WITHIN and BETWEEN portions of the model when adjusting the maximum likelihood calculations to account for the missing data ?. I ask because this could greatly influence my variable selection / modeling strategy. Thank you. 

bmuthen posted on Wednesday, September 28, 2005  9:06 pm



The answer is yes. That is how maximumlikelihood estimation under the standard "MAR" assumption works. 

anonymous posted on Monday, January 16, 2006  10:13 pm



Hi there I am running a multinomial logistic regression analysis (nominal dv; using missing and complex estimation) and wish to compare if two of my threeway interaction betas are significantly different from one another. For example, I have a 3 level dv (one is the reference) and I have a 3way interaction which is statistically signficant when comparing the first level to the reference group and not significant when comparing the second level to the reference. I wish to know if the 2 betas are significantly different from one another. Any ideas? 

bmuthen posted on Tuesday, January 17, 2006  10:54 am



You compare the log likelihood (LL) of your model with a model where you constrain your betas to be equal (using the usual Mplus approach to equality constraints). Then use 2* LL as an LRT chisquare test of the equality with df = the difference in the number of parameters of the two models. 


Dear Bengt and Linda, I have developed an SEM with TYPE COMPLEX (cluster data), and ESTIMATOR = MLR. Since I specified MISSING ARE ALL (9), I assume that there has been a listwise deletion cases. The n varies nicely with the number of variables (with missing) that is used in the analyses. Since I have missing, and would like to use a method equivalent to FIML, I have tried to specify TYPE = MISSING H1. Mplus gives no error message or warning, but simply responds with silence. The relevant commands look like this: ANALYSIS: TYPE = complex; TYPE = missing h1; ESTIMATOR = MLR; I have tested out various ways, for instance this one: ANALYSIS: TYPE = complex missing h1; ESTIMATOR = MLR; Nothing seems to help. Any advice? Best regards Leif 


I am not sure what you mean by nothing seems to help. H1 is not used with MLR. So you would say: TYPE = COMPLEX MISSING; ESTIMATOR = MLR; 


Hello again, Linda, Same lack of response from Mplus. Here are all the relevant commands. All functions well until I insert the word "missing" on the "TYPE =" line. Any advice? Leif VARIABLE: MISSING ARE ALL (9); NAMES ARE . . .; USEVARIABLES ARE . . .; CLUSTER IS ...; ANALYSIS: TYPE = complex missing; ESTIMATOR = MLR; ITERATIONS = 1000; CONVERGENCE = 0.00005; H1ITERATIONS = 500; H1CONVERGENCE = 0.0001; MODEL . . . OUTPUT: SAMPSTAT RESIDUAL STANDARDIZED CINTERVAL TECH1 MODINDICES (6.64); 


I need to know what you mean by lack of response and nothing seems to help. These don't tell me what you expect to happen that is not happening. 


Dear Linda, Sorry for not providing sufficient information in my previous question. I have been able to solve the problem by rewriting the syntax. Thanks for your patience! Leif 


I thought v4.0 supprted MISSING for TYPE=COMPLEX (as opposed to using the MIXTURE approach mentioned above). However there seems to be a listwise deletion of cases where one of my predictors is missing. TITLE: Effect of Clustering; DATA: FILE = "c:\projects\PAYS CTCYS\Select_PAYS.dat"; VARIABLE: NAMES = ID u4 u6 fr4 ip5 schoolid Year Grade CTCstat Poverty ; usevariables = u4 u6 ctcstat poverty; useobservations are grade==6 and year==2003; categorical are u4 u6; cluster = schoolid; idvariable = id; missing are all (99); ANALYSIS: TYPE=complex missing ; Model: u4 on ctcstat poverty; u6 on ctcstat poverty; Output: stand; *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 482 


There will always be listwise deletion of cases with missing on covariates because the model is estimated conditioned on the covariates. Means, variances, and covariances of the covariates are not estimated as part of the model. No missing date theory exists for covariates. If you don't want cases with missing of the covariates to be deleted, you need to bring the covariates into the model by mentioning their variances in the MODEL command. Means, variances, and covariances will then be estimated for them. In addition, distributional assumptions will be made about them as for any dependent variable. 

student07 posted on Friday, July 27, 2007  8:33 am



I'd like to ask how Mplus deals with missing values for xvariables (covariates) which are measured only on the betweenlevel when using TYPE= twolevel? Thanks in advance! 


Any observation with a missing value on a covariate is eliminated from the analysis. 

student07 posted on Monday, July 30, 2007  7:01 am



Thank you very much for your response to my earlier question  I now found that when using "type= twolevel missing", no chisquare statistics/ CFI or TLI are reported in the output. Am I doing something wrong here? Or Is there any possibility to request CFI TLI when using "type= twolevel missing"? Many thanks for your response. 


Because means, variances, and covariances are not sufficient statistics for model estimation with multilevel missing, chisquare and related fit statistics are not available. 

student07 posted on Monday, July 30, 2007  8:17 am



thanks, Linda. One more question: Is there any standard protocol how to report the adequacy of models estimated using 'type = twolelve missing'? 


When fit statistics like chisquare are not available, nested models can be compared using 2 times the loglikelihood difference for the two nested models. 


Dear Mplus developers and experts, I'm trying to carry out a twolevel analysis with data of a prepostandfollowup design in an intervention study. There are three groups (on control group and two treatment groups) on level 2 (operationalized as two dummy variables which predict the dependent variable on level 2). My question is: How can I do a twolevel analysis with taking missing data into account? Is there something like a syntax such as "TYPE=MISSING" for the twolevel approach? Best regards, Ronny. 


The default since Version 5 is TYPE=MISSING for all analyses. 

Kätlin Peets posted on Thursday, February 17, 2011  11:58 am



I have a question. My model looks like that %within% Laused2 on sugu ; Laused2 on Reading0 ; Laused2 ON Math0; Laused2 ON Avoid0; %between% reading0 avoid0 math0 AAA; Laused2 on Reading0 ; Laused2 on Math0; Laused2 on Avoid0; Laused2 ON AAA;! betweenlevel predictor Thus, I specify reading0, avoid0, math0, and AAA as part of the model in order not to lose cases with missing values on covariates. Model modif. indices suggest that I would specify correlations/covariances between avoid0, reading0, and math0. However, when I do so, my model parameters (especially betweenlevel slopes) change. Why is it so? 


Not including those correlations may give a strongly misfitting model  and as such its parameter estimates are not trustworthy. 


Does the MISSING default in version 5 handle missing data differently for TYPE = TWOLEVEL RANDOM than for a TYPE = GENERAL analysis? I've used Mplus for years, but always for SEM or LGM. I'm trying to analyze data for a schoollevel randomized control trial, in which students have a pretest and a posttest. However, the output includes the following warnings: *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 327 *** WARNING Data set contains cases with missing on all variables except xvariables. These cases were not included in the analysis. Number of cases with missing on all variables except xvariables: 56 Why is it excluding these cases if I do not have LISTWISE = ON? 


In GENERAL prior to Version 6, the model was not estimated conditioned on the observed exogenous variables as is done with TWOLEVEL RANDOM. Starting with Version 6, all models are estimated conditioned on the observed exogenous variables. Missing data theory applies only to dependent variables. This is why observations with missing on observed exogenous variables are excluded. See the 6.1 Version History for further information. 


I specify all the possible covariances between my covariates (at the within and between level) to be able to include all the cases in my analyses (when I mention only variances of xs instead of covariances, the model fit is very bad). However, I get an error message: MAXIMUM LOGLIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS 5322.918 THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.177D16. PROBLEM INVOLVING PARAMETER 37. THE NONIDENTIFICATION IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS. Can I just ignore it? 


We do not know the impact of having more parameters than clusters. This has not been studied. Certainly you don't want more between parameters than clusters because the number of clusters is the number of independent units. 


But I understood that the parameters might be untrustworthy if I don't include the covariances. What could I do? Could I just leave out some covariances (and examine the model fit)? 


If you include the covariates in the model, you must estimate the means, variances, and covariances of these variables. Perhaps you would be better off losing the observations that have missing data on the covariates. 


I could, but my sample size decreases by 30%. I considered using MI. However, I need to know covariances for my parameter estimates (Tech 3 output gives a covariance matrix for each of my imputed data sets) to estimate simple slopes. And, I did not know how to get such an estimate. 


I have another question. Why are the cases with missing values on y deleted? I get the following error message: Data set contains cases with missing on all variables except xvariables. These cases were not included in the analysis. Number of cases with missing on all variables except xvariables: 


Missing data theory applies to dependent variables. If an observation has missing data for all dependent variables, that observation contributes nothing to the analysis. 


Hi, I'm using the montecarlo feature of mplus to generate a 2 level model with 3 level 1 predictors (2 fixed and 1 random) and 1 level 2 predictor. I'm interested in creating 10% and 30% missingness across either the level 1 predictors, the level 2 predictor, or across both. When I use the PATMISS and PATPROBS commands, mplus informs me for analysis=twolevel random I must use montecarlo integration. However, when I use this integration I have several errors in the tech 9 output. I've attempted using the missing= and MODEL MISSING: commands, but have not had much success. What would be the best way to create 10% and 30% missingness on my multilevel data? Thank you for your time. 


Please send your output and license number to support@statmodel.com so I can see what you are doing and the errors you are receiving. 


Hello Drs. Muthen, I have some variables measuring depression and acitivities of daily living, which I believe have some missing data. I will be creating percents based on total scores these scales because they are frequency scales (not truly continuous). The depression scale ranges from 0 to 3 for each of 9 items; the activities of daily living scale ranges from 0 to 2 for each of 5 items. If I use the define statement at the beginning of my program, as below, will Mplus, by dafault, replace missing items with the maximum likelihoodestimated value for that item? OR should I handle missing data in SAS prior to exporting my data to Mplus for analysis? Thanks for your help! DEFINE: depress = (dep1 + dep2 + dep3 + dep4 + dep5 + dep6 + dep7 + dep8 + dep9)/18; daily = (daily1 + daily2 + daily3 + daily4 + daily5)/10; 


P.S. I also see that I can use this "DEFINE" function: variable = SUM(list of variables); I just wonder how Mplus will handle missing data in doing this sum. 


Any observation that has a missing value on one or more of the variables being summed is assigned a missing value on the sum variable. 


Thanks, Dr. Muthen. I could be making a silly mistake, but when I use this define cOmmand, I get no variance on the resulting variable. I summed across the depression items, then divided by the total possible score of 3*9=27 to create a percent which we could then be divided into four categories for the resulting percent. (For this project, we wanted four categories for depression.) But I end up with DEPRESSC variable that has no variance, so Mplus won't run the model for depression. There WAS variance on the original DEPRESS sum variable, and not all persons would fall into category 1. Is there some obvious mistake that I am making? My code: DEFINE: DEPRESS = SUM(H3SP5 H3SP6 H3SP7 H3SP8 H3SP9 H3SP10 H3SP11 H3SP12 H3SP13)/27; IF 0 <= DEPRESS < .25 THEN DEPRESSC = 1; IF .25 <= DEPRESS < .50 THEN DEPRESSC = 2; IF .50 <= DEPRESS < .75 THEN DEPRESSC = 3; IF .75 <= DEPRESS <= 1 THEN DEPRESSC = 4; The error message: *** ERROR One or more variables have a variance of zero. Check your data and format statement. Continuous Number of Variable Observations Variance PERC_HL3 9419 0.737 **DEPRESSC 9388 0.000 


I think the problem is that your statements are not being parsed because they are not stated correctly. It should be: IF (depress GE 0 and depress is LT .25) THEN depressc = 1; 


Thanks, Dr. Muthen, I'll try this! 


Greetings, I am running a latent growth curve model using complex survey data (ECLSK). I received this warning: Data set contains unknown or missing values for GROUPING, PATTERN, COHORT and/or CLUSTER variables. Number of cases with unknown or missing values: 2175 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS I reviewed the data and yes there are 2175 observations missing data for the strata and psu. These observations also have a weight of 0. It must be something with the sampling design of ECLSK. Is there anything i can or should do to make sure these values are included in the analysis? Based on the output it appears that they are not included in the analysis. thank you, Jaime 


I would contact ECLSK to see why they have weights of zero. 


I am conducting multilevel modeling with random slopes. Let's say I regress y on x and z. And, y on x is treated as random (varies between classrooms). However, I have missing data on my y. I have heard that I could potentially regress z on x to include more cases in my analyses (using FIML). I tried it and it worked. Is this allowed? Thank you, Katlin 


FIML requires more than one dependent variable. That is why your second model used FIML and your fist model did not. 


I have an aggression variable at the within level and I want to create an average cluster aggression score to use at the between level. I understand that Mplus does this automatically (by not specifying this variable as within or between). My question is how does Mplus handle missing observations at the within level (e.g., level1 aggression scores missing for a few individuals within each cluster). More specifically, is the average value based simply on the average of the nonmissing observations or are the missing observations somehow estimated first using the standard ML missing procedure? Related to the above, when would someone use Define Cluster_Mean instead of having Mplus calculate the between level values automatically? Thank you. 


When you don't put an individuallevel variable on the WITHIN list, an average cluster score is not created, a latent variable decomposition is done. See Examples 9.1 and 9.2. To create an average cluster score, use the CLUSTER_MEAN option in DEFINE. For each cluster, the value is the average of the nonmissing values in each cluster. If all values are missing in a cluster, the value is missing. 


Thank you. Would you say that the latent variable decomposition is a better approach than using cluster_mean option? Is one procedure better than the other with missing data in level 1 observations? As an aside, I find the new diagrammer in Version7 extremely useful for preparing course slides to present multiple examples. 


I don't think missing data handling is the deciding factor here. See the following paper which is available on the website: Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel effects in contextual studies. Psychological Methods, 13, 203229. 

Katerina Gk posted on Wednesday, October 16, 2013  3:54 am



Dear Linda, I got twolevel random type of mondel with missing data, Missing are all (999); ANALYSIS: TYPE IS TWOLEVEL RANDOM; ESTIMATOR IS ML; ALGORITHM=INTEGRATION; INTEGRATION=MONTECARLO; ...... indpara1  par_b XWITH a1_b ; indpara2 par_b XWITH a2_b ; indpara3  par_b XWITH a3_b ; ........ When I dont have the interaction and so I get type is two level and estimator=WLSMV, in the beginning, the programme read quickly the model and then take some time to converge but it gives the output, BUT now adding type is random and the interaction and changing the estimator to ML, mplus read very slowly the models giving one by one the iterations so I was thinking that is something wrong because of the missing data and estimator=ML. 1)Am I right saying that the programme must read the model quicklier in the beginning? 2) If yes, could you please recommend me something to fix the error. Hope I make clear where is my problem! Thank you very much for your help Katerina 


With ML and categorical outcomes, numerical integration is required. I would test the interactions one at a time and keep only those that are significant. 


Dear Dr. Muthen, I running a twolevel random slope model with missing data on both levels. I want to use FIML. I have 8 level 1 variables, 3 level 2 variables and one crosslevel interaction. Is it correct to use beside the oncommands (within: Y ON x1 x2 x3..., between y ON z1 z2 z3; S on ...) the variance commands (within: x1 x2 x3...; between z1 z2 z3) or is it also necessary to calculate covariances (x1 with x2 x3...)? The two approaches yield somewhat different results, what would be correct, and what is the difference between the approaches? Thanks Chrisotph 


In most cases, mentioning variances for "x variables" automatically covaries them. Check TECH1. If they aren't covaried by default you need to do it since you typically want an unrestricted x part. 


Thanks, actually just mentioning the var for xvariables didn't covary them. One further question: To test crosslevel interactions with missing data montecarlo integration is required. In the UG it is written, that the LL for models with montecarlo integration may be imprecise. So would it be better to use the zvalue for determining the sign. of the random slope instead of a LRTest? 


The precision depends on the number of dimensions of integration  see TECH8 screen printing and also Summary output. With say up to 4 dimensions precision may be sufficient; while with 8 dimensions it may not be. I don't think it is clear that an LR test is better; both are affected by precision issues. 


Thanks, and one further question appeared. In some random slope models the estimation does not terminate normally DUE TO A NONZERO DERIVATIVE OF THE OBSERVEDDATA LOGLIKELIHOOD. Now I exluded clusters with a low covariance coverage (in some clusters the slope is based on 2 cases, cluster size = 10), and the estimation terminates normally.Is this reasonable? And are there guidelines regarding the number of cases (with non missing values) within clusters for the estimation of random slopes with missing data? Christoph 


Yes, this is reasonable. I don't know of any such guidelines. 

Anonymous posted on Sunday, December 14, 2014  11:12 am



Dear Drs. Muthén, I have an unbalanced, longitudinal dataset (the German socioeconomic panel) with several subsamples, all starting in consecutive waves. So I have missing data by design before the start of a single subsample. Moreover, there is itemnonresponse, wavenonresponse and finally, dropout. I use a typical longitudinal multilevel model with observations clustered within participants. Here are several questions concerning the resulting missing data: (1) Is it appropriate to neglect sensitivity analyses for missing data and instead use type=twolevel and FIML only? (2) Enders (2010) says the conventional multiple imputation procedure does not consider clustering, and you have to use special procedures. Is Mplus able to do that? (3) Can I use the DiggleKenward selection model with type=twolevel in Mplus? (4) I use weights. Participants who do not take part in the survey in a specific wave do NOT have a weight. I think it would not be adequate to use multiple imputation to impute for the missing waves (variables + weights). Can I use the selection model in this case or would it be better to use FIML and use only the observed information? I would be very grateful for your support. Thank you very much! 


1) I think ML under MAR (FIML) is a quite reliable approach. Often NMAR methods give the same results as MAR. 2) Mplus can do multiple imputation using Type=Twolevel  see the UG examples 3) I haven't tried it, but I think the UG ex can be generalized to twolevel. 4) Not sure about this. You may want to ask on semnet or multilevelnet. 

Yoon Oh posted on Monday, May 04, 2015  10:26 pm



I ran a twolevel model with a continuous outcome (Y) and three predictors (cohort, treatment & prescore). After running the model, I got warning messages that 17 cases missing on Y and 2 cases missing on xvariables were not included in the analysis. > Question1: Does this mean that those 19 cases were excluded from the actual analysis? (I'm asking this because my colleague said that Mplus has the capacity to do data replacement during the actual analyses, so actually those 19 cases were included in the actual analysis, which is different from my understanding) And then, I ran the same model with the addition of the following command. Model: cohort treatment prescore; Then I found that the number of observations in the summary of analysis is a total number of cases, no warning messages on missing Y or missing xvariables. > Question2: Does this mean that the cases with missing Y and those with missing xvariables are now all included in the analysis? > Question3: If yes, how missing Y and missing x can be handled by the addition of the above command? I wonder what is being done behind the scene. Thank you so much. 


When we say 19 cases are excluded, they are not used in any way. The model is estimated conditioned on exogenous x variables. Missing data theory does not apply to them. When you bring them into the model, distributional assumptions are made about them and missing data theory is used. Observations with missing on all y variables have nothing to contribute to the analysis and are therefore exclused. 

Yoon Oh posted on Tuesday, May 05, 2015  9:23 am



Thank you for your answer. I have additional questions. First, when you say "missing data theory", do you mean "FIML"? Second, if Y is the observed single variable and has missing data, then the only way to incorporate the missing cases on Y into the analysis would be to use multiple imputation? Thanks a lot for your time. 


Yes, I mean FIML. You cannot use multiple imputation for a single y variable either. You need bivariate information in both cases. 

Yoon Oh posted on Tuesday, May 05, 2015  12:08 pm



Thanks for the answers. But isn't it possible to impute missing Y based on a set of covariates as well as auxiliary variables? I've seen people using multiple imputation to impute missing Ys using STATA or SAS. Am I wrong? 


Yes, but you can also bring covariates into the FIML model by mentioning, for example, their variances in the MODEL command. These would be asymptotically equivalent. 

Yoon Oh posted on Tuesday, May 05, 2015  1:21 pm



I'm sorry if I misunderstood what you said, but I am confused. From your previous posting, cases with missing on Y (single observed variable) are excluded even when covariates are brought into the FIML model because they have nothing to contribute to the analysis. But now it sounds like that missing Y can be handled by bringing covariates into the FIML model. Am I missing something? Thank you again for your time and patience. 


I believe that the following is true. If you have an output that shows otherwise, please send it. We make a distinction between x and y even when x is brought into the model. So I think that a case with missing on all y's will still be deleted. The above situation is different than having just one y. With just one y that does not have missing, y is not excluded so when the x's are brought in missing data theory can be used. 

Yoon Oh posted on Thursday, October 22, 2015  6:59 pm



I'm trying to run a threelevel model with random intercept and random slope. A problem is that there are missing data on a covariate with random slope. I wanted to include cases with missing covariates in the analysis by bring all Xs into the model. But I ended up with an error, saying "This model estimation is not available due to missing data in a covariate with random slope." The following codes were used for the analysis. Would you please help me to figure out how to proceed? ANALYSIS: TYPE = THREELEVEL RANDOM; ESTIMATOR = ML; MODEL: %WITHIN% Y ON X1 X2 X3 ; B4  Y ON X4 ; X1 X2 X3 X4 ; %BETWEEN CLASS%; Y B4 ON Z ; %BETWEEN SCHOOL%; Y B4 ON W ; Y WITH B4 ; 


The only way to do this is to conduct multiple imputation first. You can do 3level imputations in Mplus using the H0 imputation track. 

Patricia posted on Thursday, May 05, 2016  10:43 am



I am running a multigroup path analysis: grouping = Insomnia (0 = NoInsomSx, 1 = InsomSx) I receive the following error message: *** WARNING Data set contains unknown or missing values for GROUPING, PATTERN, COHORT, CLUSTER and/or STRATIFICATION variables. Number of cases with unknown or missing values: 3 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS However, I checked my data file and all cases are coded appropriately for the grouping variable (no missing values). What is happening? Thank you! 


Please send the output, data set, and your license number to support@statmodel.com. 


I'm running a 2level model (no random slopes), with missing data. I'm trying to use FIML (mentioning variances of predictors) to include all participants, but I run into issues with this due to crosslevel interactions. I tried incorporating the montecarlo integration algorithm to deal with this issue, but I receive the following error: " THE ESTIMATED BETWEEN COVARIANCE MATRIX COULD NOT BE INVERTED. COMPUTATION COULD NOT BE COMPLETED IN ITERATION 1. CHANGE YOUR MODEL AND/OR STARTING VALUES. THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY DUE TO AN ERROR IN THE COMPUTATION. CHANGE YOUR MODEL AND/OR STARTING VALUES." Below is the relevant syntax: ANALYSIS: type=twolevel random; estimator=MLR; algorithm=integration; integration=montecarlo; MODEL: %within% lonew3 on sex_dw1 lonew2c AA asian white other; s_vic lonew3 on zpickw2; s_reciplonew3 on recip_dw2; s_vicreclonew3 on zvicXrec; zpickw2 recip_dw2 lonew3 sex_dw1 lonew2c aa asian white other zvicXrec; %between% lonew3 on sthw2_Mc; s_vic on sthw2_Mc; s_recip on sthw2_Mc; s_vicrec on sthw2_Mc; s_vic@0 s_recip@0 s_vicrec@0; sthw2_mc s_vic s_recip s_vicrec; Any help troubleshooting these error messages would be greatly appreciated. 


Try deleting the Between statement: s_vic@0 s_recip@0 s_vicrec@0; 


Thank you for the feedback, Dr. Muthen. The model successfully runs when I remove the suggested between statement: s_vic@0 s_recip@0 s_vicrec@0; I have a couple brief followup questions: 1. Does this mean it is necessary to allow random slopes in order for the montecarlo integration to converge? Or is there an alternative way to specify that those slopes are fixed at zero that won't cause convergence problems? 2. I want to confirm that the following error message is ignorable, given that it is followed by "THE MODEL ESTIMATION TERMINATED NORMALLY" WARNING: THE MODEL ESTIMATION HAS REACHED A SADDLE POINT OR A POINT WHERE THE OBSERVED AND THE EXPECTED INFORMATION MATRICES DO NOT MATCH. AN ADJUSTMENT TO THE ESTIMATION OF THE INFORMATION MATRIX HAS BEEN MADE. THE CONDITION NUMBER IS 0.943D03. THE PROBLEM MAY ALSO BE RESOLVED BY DECREASING THE VALUE OF THE MCONVERGENCE OR LOGCRITERION OPTIONS OR BY CHANGING THE STARTING VALUES OR BY INCREASING THE NUMBER OF INTEGRATION POINTS OR BY USING THE MLF ESTIMATOR. Thank you again. 


1. You can specify lonew3 on zpickw2; onew3 on recip_dw2; lonew3 on zvicXrec; and create interactions between the within covariates and the between covariates that you use on between. 2. If you obtain standard errors in your results, the message can be ignored. 


Dear MPlus team, I have been trying to estimate twolevel models with observations nested within persons (data from the German SocioEconomic Panel). Wherever I have categorical dependent variables, I run into problems (days and weeks of computing time) because montecarlo numerical integration is required if I bring predictors and a couple of other meaningful covariates into the model to estimate missing data (including attrition). I have tried to switch to Bayes because I read somewhere in the forum that it might work faster. However, I still get a fatal error message that this model can only be done with montecarlo integration. My questions are: 1) Is it really impossible to work around numerical integration with type = twolevel, a categorical dependent variable, and covariances between predictors in the model? 2) Is bringing all covariances between the predictors into the model really the best way of enabling FIML for predictors as well? I do get quite different sample sizes and regression estimates if I don't bring predictors into the model (or not all of them). 


Is there an obvious reason I would receive considerably different result using SPSS's (V 24) MIXED procedure for a threelevel model compared to Mplus's (V 7.4) Type is THREELEVEL? The coefficients are pretty close, but the Mplus standard errors are considerable smaller, resulting in considerably smaller pvalues for covariates of interest in the Mplus results. For SPSS I am using REML, and in Mplus using MLR. I expect this would produce some discrepancy, but the differences appear greater than expected. I can paste syntax and/or output if needed, but thought I'd first inquire if there is a simple and obvious explanation that transcends the particulars of my model. I will say, there is no missing dataso that can be ruled out as an explanation. Though, the reason I am using Mplus is because I intend to introduce covariates in subsequent models that do have missingness, and plan to mention the variances for those covariates in order to not drop cases; but first want to determine why my results are not replicating across statistical programs for a basic model without missingness. Thanks. 


You will not obtain the same standard errors with REML and MLR. Try using ML in both programs. 


Thank you for your prompt reply. I should have investigated that myself first. Indeed those results are similarand the pvalues fall roughly midway between the SPSS REML and Mplus MLR pvalues. Using a continuous outcome variable, the different estimates for our dichotomous Treatment variable are: SPSS REML: 1.41 (1.75), p = .465 SPSS ML: 1.63 (1.46), p = .267 Mplus ML: 1.59 (1.47), p = .278 Mplus MLR: 1.59 (1.10), p = .147 I recognize that the statistical inference doesn't vary between these models; however, when I include other covariates, namely pretest, the pvalue with Mplus MLR does become statistically significant. Do you have any thoughts on which results we might consider most trustworthy? What factors should be considered in making this decision? These data are drawn from a 22 site cluster randomized trial, with sitelevel assignment to one of two conditions. Sites varied in size, but the sample is fairly balanced between conditions. Thank you for your thoughts and insight. 


I would use Mplus MLR because it takes into account any nonnormality of the outcomes. 


Thank you so much for your recommendation and rationale. More to the point of this thread, missing data in multilevel analysis: I have missingness on a covariate (pretest) and plan to mention that covariate, so that means and variances are estimated, and I don't drop cases missing pretest. I also plan to use the method described in Mplus Web Notes No. 11 on constructing covariates in multilevel regression. My understanding is, when mentioning covariates to address missingness, that all covariates need to be mentioned (so as to not make the correlation between the variable in and out of the model zero). However, it is my hunch that this is a levelspecific requirement. Using a twolevel model as an example, where missingness only occurs for variable Pre: %WITHIN% Pre L1cov; Post ON Pre L1cov; %BETWEEN% Post on Pre L2cov; Am I correct that I need to mention Pre and L1cov on the WITHIN level (both measured at the withinlevel), but it is unnecessary to mention Pre and L2cov on the BETWEEN level because of orthogonality of variance between the within and betweenparts of the model? Thank you. 


Hello, I am running an LCA on complex survey data and I am having trouble with a persistent error. *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 412 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS I've seen that this can occur when there is missingness in the covariates, but I dropped all observations with missing on the covariates before analyzing the data on mplus. I also recoded all missing in the predictors to 9999 in stata to avoid any issues with mplus reading blanks. Could this error be referring to missingness on the binary predictors...and if so how can I resolve that? I thought the default method for dealing with missing data was FIML, which did not drop observations. Any insight would be appreciated! Best, Shannon 


Shannon: This message refers to the variables on the USEVARIABLES list. Be sure you are reading the data correctly if you don't think 412 cases have missing values on all of the variables on the USEVARIABLES list. 


Answer for Mark: You are correct. L2cov  when put on the Between= list  needs to be mentioned on Between only if it has missing data. 


Excellent. Thank you. One last question: What is the proper term or phrasing I should use when describing this technique for handling missingness? Is it that I declare covariates as endogenous so that FIML can be applied to them? Is this a meanstructure approach? Thank you for your guidance. If there is anything I can cite that describes what Mplus is doing so as to not drop cases, please let me know. I have looked at the statmodel Web Notes and Special Topics pages, and it was not apparent to me that any of those manuscripts addresses this issue in particular. Thank you again. 

Anne Black posted on Wednesday, March 15, 2017  2:16 pm



Dear Dr. Muthen, I'm using the data imputation option to handle incomplete covariates measured for individuals nested within clusters. If I include the cluster variable in the imputation model, will the hierarchical data structure be preserved , or do I need to specify that another way? Thank you for your advice. 


See UG ex11.8  the cluster variable is not in the imputation model. 


Rephrasing my question above: A colleague indicated that Mplus could not handle performing FIML with an independent (exogenous) variable unless it is tricked into doing so by predicting the var with incomplete data using an auxiliary variable. The auxiliary variable could even be just a column of 1s. This makes Mplus treat the var as an endogenous variable. Supplementary to this, in a SAS paper (3122012), Paul Allison addressed several ways of using ML when data are missing on predictor variables. One is to use the EM algorithm to produce the means and covariance matrix for all the variables in the model (using PROC MI with NIMPUTE=0). The second is to use a SEMbased FIML approach (using PROC CALIS). My question is, is the approach in Mplus of mentioning the mean or variance of an independent vars with incomplete data not using FIML to handle the missingness? In which case, this approach in Mplus might be more like what Allison is describing in the first approach. Or is it the case that it is a FIML approach, whereby this is another way (besides the use of auxiliary vars) to trick Mplus into treating it as endogenous? Any guidance you can provide on terminology or phrasing for describing this approach will be greatly appreciated, including if this is what is referenced in the literature as a mean structure approach to handling missing data. 

Anne Black posted on Thursday, March 16, 2017  8:07 am



Thank you, Dr. Muthen. I ultimately need to conduct a multiple groups analysis (for which estimator=Bayes is not available), but think I could use example 11.8 to impute values for each group separately, then combine the data sets. Does that seem reasonable? Or better to use the grouping variable (which is different from the cluster variable) in the imputation model? 


Probably the latter. 


Answer to Lavenia: With Mplus you can bring a covariate x into the model (making it endogenous) by mentioning either its mean or variance. This implies that the model s expanded to the joint distribution of y and x instead of the usual approach of y conditional on x (saying nothing about the x distribution). For this, Mplus used FIML assuming that x is normal. All of this is described in detail in Chapter 10 of our new book. 


Thank you for the succinct explanation and the reference to your book. I see Ch.1.9.4 covers Bringing covariates into the model: Missing data on x. I'll be ordering it now. Tusen tack! 

Melanie Wall posted on Wednesday, September 27, 2017  1:15 pm



In Mplus 7.11 we were able to run imputation with type = complex. But now in MPlus 8 (and in our older version 7.4) we cannot run it. We get the error about COMPLEX not being compatible with DATA IMPUTATION. Coming online, we see many posts saying Complex and imputation are not compatible, with other fixes suggested. Should we trust the 7.11 output? 


In Version 7.11 the data imputation is done correctly, however, none of the complex sampling features (weights, strata, cluster) are used during the imputation. The complex sampling features were used only during the model estimation that uses the imputed data. We disallowed that now so that it is clearer what is being done. You can repeat the 7.11 process in two steps in version 8  impute the data using type=basic without the complex features and then analyze the imputed data using the complex features in a second step  the results should be identical (twostep V8 v.s. V7.11). Anyway, we don't really recommend that approach. What we recommend is the following (unfortunately that is not done automatically for you). 1. For cluster sampling we recommend twolevel data imputation (using type=twolevel basic and the cluster variable = cluster sampling unit) 2. Strata  we recommend using multiple groups, i.e., impute each stratum separately. 3. Sampling weights  if you have sampling weights and missing data we don't really recommend MI at all. The best method is FIML. If you still need to do MI  you should use the sampling weight and log(sampling weight) and all other information related to the sampling weights in your imputation model. Any kind of proper MI method with sampling weights has to explicitly model the relationship between the weight and the variables, which is a huge drawback, because FIML works without assuming any relationship form between the weights and the variables, i.e., it works with any relationship and it doesn't have to be specified. 

Melanie Wall posted on Thursday, September 28, 2017  10:51 am



Thank you Tihomir for solving the mystery about 7.11. I understand the issue/difficulty about imputing with the weights. We would be happy to just use FIML for our problem, but we have several missing covariates (X variables) which cannot be addressed other than listwise deletion by FIML. Do you have any suggested tricks for making X variables somehow into Y variables so those subjects with missing X values will not be included by the FIML. 


You can still try MI via point 3 above, however, the easiest way might be to use FIML and add a model for the X variable that has missing values (making it a dependent variable). Something like that X1 on X2X5; where X2 to X5 are the covariates that have no missing values and X1 is the covariate that has missing values. One thing to keep in mind is that the missing values are always based on some model assumptions, so whichever way you go make sure the model assumptions are reasonable. 


Just to add that in certain situations MI using strategy like described above to deal with the weights might be the best solution in some cases. For example, if you have many binary covariates with many missing values you are better off imputing from a multivariate probit model, which is not available for ML/FIML. Also take a look at Chapter 9 http://statmodel.com/Mplus_Book.shtml 

Back to top 