Message/Author 


Version 2.02 allows missing data modeling when a latent mixture model is fit to data with a complex sampling design. Can missing data be handled with other models using complex data? I thought I read in the manual that it can't, but maybe the missing data feature has been added to other models in v. 2.02. 


No, missing cannot be handled by the regular TYPE=COMPLEX or TYPE=TWOLEVEL. However, you can use TYPE=MIXTURE COMPLEX MISSING with one class and thereby get missing for COMPLEX. 


Can multiple groups be analyzed using the TYPE=MIXTURE COMPLEX MISSING? 


For TYPE=MIXTURE, the training data feature can be used to define groups. 

Maggie posted on Friday, April 02, 2004  5:01 am



I am designing a twolevel SEM and I have much much missing data of the independent variables at within level. But there is no missing data at betweenlevel independent vairabels. Can this model still be handeld by Mplus version 3? or should I replace all of the missing data at within level before runing the program? Thanks for suggestion. 


This can be handled by Mplus Version 3. You should treat the x variables as y variables. The normality assumption changes from normality given x to overall normality. You change them to y's by mentioning their variances in the MODEL command. Then use TYPE = TWOLEVEL MISSING; 

Maggie posted on Monday, April 05, 2004  2:01 am



So you means that I can add the variances of withinlevel X variables into the MODEL command of within part. I.e.: %WITHIN% fw BY Y1Y3; fw ON X1; fw ON X2; fw x1x2; and for the between part of model,I needn't to add the variance of X variables (given that there is no missing data in between part). Am I correct? Thanks again. 


Yes. What you did is correct. 


We are running a structural equation model with clustered data (teenagers clustered within schools) using TYPE=COMPLEX modeling. We have missing data for some schools. Are there any special concerns we should consider when the missing data is at the second level, other than the usual things, like coverage and that missingness is at least MAR, when using TYPE=COMPLEX MISSING? 

bmuthen posted on Saturday, April 24, 2004  9:03 am



No special concerns, just the usual ones. 

Maggie posted on Tuesday, May 11, 2004  10:00 am



Questions again from a new user of Mplus 3. Could I return to the previous question posted on April 05? As you suggested, I add the variances of Xs in the MODEL command, but the output suggests me to use ALGORITHM = INTEGRATION; INTEGRATION= MONTECARLO in the ANALYSIS command. I refer to the example of MPLUS short curse: multilevel regression model,page 52, in the input command, there is no specification of variance of missing data and also no ALGORITHM command although in the VARIABLE, missing data is mentioned. So in general, in which situation should I add vaiances of missing Xs? In fact, after I add ALGORITHM = INTEGRATION; INTEGRATION= MONTECARLO into the analysis, no any output comes out, only shows that "INPUT READING TERMINATED NORMALLY" (I put output option as SAMPSTAT TECH8). In this case,is it necessary to run Monte Carlo simulation to generaing the missing data? If possible, could you please suggest me one complete example of Twolevel with Random and dealing with missing data? This perhaps can enable me ask you less questions concerning the similar issues. Thanks in advance for your kind response. 


We don't have examples that show MISSING. You just need to add it to the TYPE option of the ANALYSIS command. I suspect that your outcomes are not continuous and that is why numerical integration is required. Please send your output to support@statmodel.com if you want me to look at it. 

Mpduser1 posted on Wednesday, September 28, 2005  11:51 am



I'm building at multilevel SEM with two endogenous variables, Y1 and Y2, both of which are prone to missingness, and both of which have WITHIN and BETWEEN sources of variation. The missing data rate for Y2 is much higher than the missing data rate for Y1. Y2 is categorical, Y1 is ordinal. My question is this: Does Mplus 3.13 use information from both the WITHIN and BETWEEN portions of the model when adjusting the maximum likelihood calculations to account for the missing data ?. I ask because this could greatly influence my variable selection / modeling strategy. Thank you. 

bmuthen posted on Wednesday, September 28, 2005  9:06 pm



The answer is yes. That is how maximumlikelihood estimation under the standard "MAR" assumption works. 

anonymous posted on Monday, January 16, 2006  10:13 pm



Hi there I am running a multinomial logistic regression analysis (nominal dv; using missing and complex estimation) and wish to compare if two of my threeway interaction betas are significantly different from one another. For example, I have a 3 level dv (one is the reference) and I have a 3way interaction which is statistically signficant when comparing the first level to the reference group and not significant when comparing the second level to the reference. I wish to know if the 2 betas are significantly different from one another. Any ideas? 

bmuthen posted on Tuesday, January 17, 2006  10:54 am



You compare the log likelihood (LL) of your model with a model where you constrain your betas to be equal (using the usual Mplus approach to equality constraints). Then use 2* LL as an LRT chisquare test of the equality with df = the difference in the number of parameters of the two models. 


Dear Bengt and Linda, I have developed an SEM with TYPE COMPLEX (cluster data), and ESTIMATOR = MLR. Since I specified MISSING ARE ALL (9), I assume that there has been a listwise deletion cases. The n varies nicely with the number of variables (with missing) that is used in the analyses. Since I have missing, and would like to use a method equivalent to FIML, I have tried to specify TYPE = MISSING H1. Mplus gives no error message or warning, but simply responds with silence. The relevant commands look like this: ANALYSIS: TYPE = complex; TYPE = missing h1; ESTIMATOR = MLR; I have tested out various ways, for instance this one: ANALYSIS: TYPE = complex missing h1; ESTIMATOR = MLR; Nothing seems to help. Any advice? Best regards Leif 


I am not sure what you mean by nothing seems to help. H1 is not used with MLR. So you would say: TYPE = COMPLEX MISSING; ESTIMATOR = MLR; 


Hello again, Linda, Same lack of response from Mplus. Here are all the relevant commands. All functions well until I insert the word "missing" on the "TYPE =" line. Any advice? Leif VARIABLE: MISSING ARE ALL (9); NAMES ARE . . .; USEVARIABLES ARE . . .; CLUSTER IS ...; ANALYSIS: TYPE = complex missing; ESTIMATOR = MLR; ITERATIONS = 1000; CONVERGENCE = 0.00005; H1ITERATIONS = 500; H1CONVERGENCE = 0.0001; MODEL . . . OUTPUT: SAMPSTAT RESIDUAL STANDARDIZED CINTERVAL TECH1 MODINDICES (6.64); 


I need to know what you mean by lack of response and nothing seems to help. These don't tell me what you expect to happen that is not happening. 


Dear Linda, Sorry for not providing sufficient information in my previous question. I have been able to solve the problem by rewriting the syntax. Thanks for your patience! Leif 


I thought v4.0 supprted MISSING for TYPE=COMPLEX (as opposed to using the MIXTURE approach mentioned above). However there seems to be a listwise deletion of cases where one of my predictors is missing. TITLE: Effect of Clustering; DATA: FILE = "c:\projects\PAYS CTCYS\Select_PAYS.dat"; VARIABLE: NAMES = ID u4 u6 fr4 ip5 schoolid Year Grade CTCstat Poverty ; usevariables = u4 u6 ctcstat poverty; useobservations are grade==6 and year==2003; categorical are u4 u6; cluster = schoolid; idvariable = id; missing are all (99); ANALYSIS: TYPE=complex missing ; Model: u4 on ctcstat poverty; u6 on ctcstat poverty; Output: stand; *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 482 


There will always be listwise deletion of cases with missing on covariates because the model is estimated conditioned on the covariates. Means, variances, and covariances of the covariates are not estimated as part of the model. No missing date theory exists for covariates. If you don't want cases with missing of the covariates to be deleted, you need to bring the covariates into the model by mentioning their variances in the MODEL command. Means, variances, and covariances will then be estimated for them. In addition, distributional assumptions will be made about them as for any dependent variable. 

student07 posted on Friday, July 27, 2007  8:33 am



I'd like to ask how Mplus deals with missing values for xvariables (covariates) which are measured only on the betweenlevel when using TYPE= twolevel? Thanks in advance! 


Any observation with a missing value on a covariate is eliminated from the analysis. 

student07 posted on Monday, July 30, 2007  7:01 am



Thank you very much for your response to my earlier question  I now found that when using "type= twolevel missing", no chisquare statistics/ CFI or TLI are reported in the output. Am I doing something wrong here? Or Is there any possibility to request CFI TLI when using "type= twolevel missing"? Many thanks for your response. 


Because means, variances, and covariances are not sufficient statistics for model estimation with multilevel missing, chisquare and related fit statistics are not available. 

student07 posted on Monday, July 30, 2007  8:17 am



thanks, Linda. One more question: Is there any standard protocol how to report the adequacy of models estimated using 'type = twolelve missing'? 


When fit statistics like chisquare are not available, nested models can be compared using 2 times the loglikelihood difference for the two nested models. 


Dear Mplus developers and experts, I'm trying to carry out a twolevel analysis with data of a prepostandfollowup design in an intervention study. There are three groups (on control group and two treatment groups) on level 2 (operationalized as two dummy variables which predict the dependent variable on level 2). My question is: How can I do a twolevel analysis with taking missing data into account? Is there something like a syntax such as "TYPE=MISSING" for the twolevel approach? Best regards, Ronny. 


The default since Version 5 is TYPE=MISSING for all analyses. 

Kätlin Peets posted on Thursday, February 17, 2011  11:58 am



I have a question. My model looks like that %within% Laused2 on sugu ; Laused2 on Reading0 ; Laused2 ON Math0; Laused2 ON Avoid0; %between% reading0 avoid0 math0 AAA; Laused2 on Reading0 ; Laused2 on Math0; Laused2 on Avoid0; Laused2 ON AAA;! betweenlevel predictor Thus, I specify reading0, avoid0, math0, and AAA as part of the model in order not to lose cases with missing values on covariates. Model modif. indices suggest that I would specify correlations/covariances between avoid0, reading0, and math0. However, when I do so, my model parameters (especially betweenlevel slopes) change. Why is it so? 


Not including those correlations may give a strongly misfitting model  and as such its parameter estimates are not trustworthy. 


Does the MISSING default in version 5 handle missing data differently for TYPE = TWOLEVEL RANDOM than for a TYPE = GENERAL analysis? I've used Mplus for years, but always for SEM or LGM. I'm trying to analyze data for a schoollevel randomized control trial, in which students have a pretest and a posttest. However, the output includes the following warnings: *** WARNING Data set contains cases with missing on xvariables. These cases were not included in the analysis. Number of cases with missing on xvariables: 327 *** WARNING Data set contains cases with missing on all variables except xvariables. These cases were not included in the analysis. Number of cases with missing on all variables except xvariables: 56 Why is it excluding these cases if I do not have LISTWISE = ON? 


In GENERAL prior to Version 6, the model was not estimated conditioned on the observed exogenous variables as is done with TWOLEVEL RANDOM. Starting with Version 6, all models are estimated conditioned on the observed exogenous variables. Missing data theory applies only to dependent variables. This is why observations with missing on observed exogenous variables are excluded. See the 6.1 Version History for further information. 


I specify all the possible covariances between my covariates (at the within and between level) to be able to include all the cases in my analyses (when I mention only variances of xs instead of covariances, the model fit is very bad). However, I get an error message: MAXIMUM LOGLIKELIHOOD VALUE FOR THE UNRESTRICTED (H1) MODEL IS 5322.918 THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NONPOSITIVE DEFINITE FIRSTORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.177D16. PROBLEM INVOLVING PARAMETER 37. THE NONIDENTIFICATION IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE NUMBER OF CLUSTERS. REDUCE THE NUMBER OF PARAMETERS. Can I just ignore it? 


We do not know the impact of having more parameters than clusters. This has not been studied. Certainly you don't want more between parameters than clusters because the number of clusters is the number of independent units. 


But I understood that the parameters might be untrustworthy if I don't include the covariances. What could I do? Could I just leave out some covariances (and examine the model fit)? 


If you include the covariates in the model, you must estimate the means, variances, and covariances of these variables. Perhaps you would be better off losing the observations that have missing data on the covariates. 


I could, but my sample size decreases by 30%. I considered using MI. However, I need to know covariances for my parameter estimates (Tech 3 output gives a covariance matrix for each of my imputed data sets) to estimate simple slopes. And, I did not know how to get such an estimate. 


I have another question. Why are the cases with missing values on y deleted? I get the following error message: Data set contains cases with missing on all variables except xvariables. These cases were not included in the analysis. Number of cases with missing on all variables except xvariables: 


Missing data theory applies to dependent variables. If an observation has missing data for all dependent variables, that observation contributes nothing to the analysis. 


Hi, I'm using the montecarlo feature of mplus to generate a 2 level model with 3 level 1 predictors (2 fixed and 1 random) and 1 level 2 predictor. I'm interested in creating 10% and 30% missingness across either the level 1 predictors, the level 2 predictor, or across both. When I use the PATMISS and PATPROBS commands, mplus informs me for analysis=twolevel random I must use montecarlo integration. However, when I use this integration I have several errors in the tech 9 output. I've attempted using the missing= and MODEL MISSING: commands, but have not had much success. What would be the best way to create 10% and 30% missingness on my multilevel data? Thank you for your time. 


Please send your output and license number to support@statmodel.com so I can see what you are doing and the errors you are receiving. 


Hello Drs. Muthen, I have some variables measuring depression and acitivities of daily living, which I believe have some missing data. I will be creating percents based on total scores these scales because they are frequency scales (not truly continuous). The depression scale ranges from 0 to 3 for each of 9 items; the activities of daily living scale ranges from 0 to 2 for each of 5 items. If I use the define statement at the beginning of my program, as below, will Mplus, by dafault, replace missing items with the maximum likelihoodestimated value for that item? OR should I handle missing data in SAS prior to exporting my data to Mplus for analysis? Thanks for your help! DEFINE: depress = (dep1 + dep2 + dep3 + dep4 + dep5 + dep6 + dep7 + dep8 + dep9)/18; daily = (daily1 + daily2 + daily3 + daily4 + daily5)/10; 


P.S. I also see that I can use this "DEFINE" function: variable = SUM(list of variables); I just wonder how Mplus will handle missing data in doing this sum. 


Any observation that has a missing value on one or more of the variables being summed is assigned a missing value on the sum variable. 


Thanks, Dr. Muthen. I could be making a silly mistake, but when I use this define cOmmand, I get no variance on the resulting variable. I summed across the depression items, then divided by the total possible score of 3*9=27 to create a percent which we could then be divided into four categories for the resulting percent. (For this project, we wanted four categories for depression.) But I end up with DEPRESSC variable that has no variance, so Mplus won't run the model for depression. There WAS variance on the original DEPRESS sum variable, and not all persons would fall into category 1. Is there some obvious mistake that I am making? My code: DEFINE: DEPRESS = SUM(H3SP5 H3SP6 H3SP7 H3SP8 H3SP9 H3SP10 H3SP11 H3SP12 H3SP13)/27; IF 0 <= DEPRESS < .25 THEN DEPRESSC = 1; IF .25 <= DEPRESS < .50 THEN DEPRESSC = 2; IF .50 <= DEPRESS < .75 THEN DEPRESSC = 3; IF .75 <= DEPRESS <= 1 THEN DEPRESSC = 4; The error message: *** ERROR One or more variables have a variance of zero. Check your data and format statement. Continuous Number of Variable Observations Variance PERC_HL3 9419 0.737 **DEPRESSC 9388 0.000 


I think the problem is that your statements are not being parsed because they are not stated correctly. It should be: IF (depress GE 0 and depress is LT .25) THEN depressc = 1; 


Thanks, Dr. Muthen, I'll try this! 


Greetings, I am running a latent growth curve model using complex survey data (ECLSK). I received this warning: Data set contains unknown or missing values for GROUPING, PATTERN, COHORT and/or CLUSTER variables. Number of cases with unknown or missing values: 2175 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS I reviewed the data and yes there are 2175 observations missing data for the strata and psu. These observations also have a weight of 0. It must be something with the sampling design of ECLSK. Is there anything i can or should do to make sure these values are included in the analysis? Based on the output it appears that they are not included in the analysis. thank you, Jaime 


I would contact ECLSK to see why they have weights of zero. 


I am conducting multilevel modeling with random slopes. Let's say I regress y on x and z. And, y on x is treated as random (varies between classrooms). However, I have missing data on my y. I have heard that I could potentially regress z on x to include more cases in my analyses (using FIML). I tried it and it worked. Is this allowed? Thank you, Katlin 


FIML requires more than one dependent variable. That is why your second model used FIML and your fist model did not. 


I have an aggression variable at the within level and I want to create an average cluster aggression score to use at the between level. I understand that Mplus does this automatically (by not specifying this variable as within or between). My question is how does Mplus handle missing observations at the within level (e.g., level1 aggression scores missing for a few individuals within each cluster). More specifically, is the average value based simply on the average of the nonmissing observations or are the missing observations somehow estimated first using the standard ML missing procedure? Related to the above, when would someone use Define Cluster_Mean instead of having Mplus calculate the between level values automatically? Thank you. 


When you don't put an individuallevel variable on the WITHIN list, an average cluster score is not created, a latent variable decomposition is done. See Examples 9.1 and 9.2. To create an average cluster score, use the CLUSTER_MEAN option in DEFINE. For each cluster, the value is the average of the nonmissing values in each cluster. If all values are missing in a cluster, the value is missing. 


Thank you. Would you say that the latent variable decomposition is a better approach than using cluster_mean option? Is one procedure better than the other with missing data in level 1 observations? As an aside, I find the new diagrammer in Version7 extremely useful for preparing course slides to present multiple examples. 


I don't think missing data handling is the deciding factor here. See the following paper which is available on the website: Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel effects in contextual studies. Psychological Methods, 13, 203229. 

Katerina Gk posted on Wednesday, October 16, 2013  3:54 am



Dear Linda, I got twolevel random type of mondel with missing data, Missing are all (999); ANALYSIS: TYPE IS TWOLEVEL RANDOM; ESTIMATOR IS ML; ALGORITHM=INTEGRATION; INTEGRATION=MONTECARLO; ...... indpara1  par_b XWITH a1_b ; indpara2 par_b XWITH a2_b ; indpara3  par_b XWITH a3_b ; ........ When I dont have the interaction and so I get type is two level and estimator=WLSMV, in the beginning, the programme read quickly the model and then take some time to converge but it gives the output, BUT now adding type is random and the interaction and changing the estimator to ML, mplus read very slowly the models giving one by one the iterations so I was thinking that is something wrong because of the missing data and estimator=ML. 1)Am I right saying that the programme must read the model quicklier in the beginning? 2) If yes, could you please recommend me something to fix the error. Hope I make clear where is my problem! Thank you very much for your help Katerina 


With ML and categorical outcomes, numerical integration is required. I would test the interactions one at a time and keep only those that are significant. 


Dear Dr. Muthen, I running a twolevel random slope model with missing data on both levels. I want to use FIML. I have 8 level 1 variables, 3 level 2 variables and one crosslevel interaction. Is it correct to use beside the oncommands (within: Y ON x1 x2 x3..., between y ON z1 z2 z3; S on ...) the variance commands (within: x1 x2 x3...; between z1 z2 z3) or is it also necessary to calculate covariances (x1 with x2 x3...)? The two approaches yield somewhat different results, what would be correct, and what is the difference between the approaches? Thanks Chrisotph 


In most cases, mentioning variances for "x variables" automatically covaries them. Check TECH1. If they aren't covaried by default you need to do it since you typically want an unrestricted x part. 


Thanks, actually just mentioning the var for xvariables didn't covary them. One further question: To test crosslevel interactions with missing data montecarlo integration is required. In the UG it is written, that the LL for models with montecarlo integration may be imprecise. So would it be better to use the zvalue for determining the sign. of the random slope instead of a LRTest? 


The precision depends on the number of dimensions of integration  see TECH8 screen printing and also Summary output. With say up to 4 dimensions precision may be sufficient; while with 8 dimensions it may not be. I don't think it is clear that an LR test is better; both are affected by precision issues. 


Thanks, and one further question appeared. In some random slope models the estimation does not terminate normally DUE TO A NONZERO DERIVATIVE OF THE OBSERVEDDATA LOGLIKELIHOOD. Now I exluded clusters with a low covariance coverage (in some clusters the slope is based on 2 cases, cluster size = 10), and the estimation terminates normally.Is this reasonable? And are there guidelines regarding the number of cases (with non missing values) within clusters for the estimation of random slopes with missing data? Christoph 


Yes, this is reasonable. I don't know of any such guidelines. 

Anonymous posted on Sunday, December 14, 2014  11:12 am



Dear Drs. Muthén, I have an unbalanced, longitudinal dataset (the German socioeconomic panel) with several subsamples, all starting in consecutive waves. So I have missing data by design before the start of a single subsample. Moreover, there is itemnonresponse, wavenonresponse and finally, dropout. I use a typical longitudinal multilevel model with observations clustered within participants. Here are several questions concerning the resulting missing data: (1) Is it appropriate to neglect sensitivity analyses for missing data and instead use type=twolevel and FIML only? (2) Enders (2010) says the conventional multiple imputation procedure does not consider clustering, and you have to use special procedures. Is Mplus able to do that? (3) Can I use the DiggleKenward selection model with type=twolevel in Mplus? (4) I use weights. Participants who do not take part in the survey in a specific wave do NOT have a weight. I think it would not be adequate to use multiple imputation to impute for the missing waves (variables + weights). Can I use the selection model in this case or would it be better to use FIML and use only the observed information? I would be very grateful for your support. Thank you very much! 


1) I think ML under MAR (FIML) is a quite reliable approach. Often NMAR methods give the same results as MAR. 2) Mplus can do multiple imputation using Type=Twolevel  see the UG examples 3) I haven't tried it, but I think the UG ex can be generalized to twolevel. 4) Not sure about this. You may want to ask on semnet or multilevelnet. 

Back to top 